RESULT 1 > jnp 

AC079031 / J 

LOCUS AC079031 187901 bp DNA linear ( HTG 02-OCT-2CHH 

DEFINITION Homo sapiens chromosome 12q clone RP11-503G7, WORKrttQ^D RAFT 1 3 A\ 

SEQUENCE, 7 unordered pieces. /^M-ft 
ACCESSION AC079031 2^77 ^ 

VERSION AC079031.15 GI: 15809112 ** <J v£> 

KEYWORDS HTG; HTGS_PHASE1; HTGS_DRAFT. 
SOURCE human. 

ORGANISM Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae ; Homo. 
REFERENCE 1 (bases 1 to 187901) 
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Jackson, L. E. , Jacobson,B., Jia,Y., Johnson, R., Jolivet,S., 
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TITLE Direct Submission 

JOURNAL Unpublished 
REFERENCE 2 (bases 1 to 187901) 



AUTHORS 

TITLE 

JOURNAL 



COMMENT 



Worley, K.C. 
Direct Submission 

Submitted ( 17-AUG-2000 ) Human Genome Sequencing Center, Department 
of Molecular and Human Genetics, Baylor College of Medicine, One 
Baylor Plaza, Houston, TX 77030, USA 

On Sep 30, 2001 this sequence version replaced gi: 14328945. 
Genome Center 

Center: Baylor College of Medicine 

Center code: BCM 

Web site: http://www.hgsc.bcm.tmc.edu/ 

Contact : hgsc-help@bcm. tmc.edu 
Project Information 

Center project name: HBQO 

Center clone name: RP11-503G7 
Summary Statistics 

Sequencing vector: Plasmid; M77789 

Sequencing vector: M13; L08821 

Chemistry: Dye-primer Bodipy: 32% of reads 

Chemistry: Dye-terminator Big Dye: 68% of reads 

Assembly program: Phrap; version 0.990329 

Consensus quality: 182139 bases at least Q40 

Consensus quality: 183938 bases at least Q30 

Consensus quality: 184894 bases at least Q20 

Estimated insert size: 185621; sum-of-contigs estimation 

Quality coverage: Ox in Q20 bases; agarose-fp estimation 

Quality coverage: 5 . 4x in Q20 bases; sum-of-contigs estimation 



NOTE: Estimated insert size may differ from sequence length 

(see http : //www . hgsc . bcm. tmc . edu/docs /Genbank_draf t_data . html ) 
NOTE: This is a 'working draft 1 sequence. It currently 
consists of 7 contigs . The true order of the pieces 
is not known and their order in this sequence record is 
arbitrary. Gaps between the contigs are represented as 
runs of N, but the exact sizes of the gaps are unknown. 
This record will be updated with the finished sequence 
as soon as it is available and the accession number will 
be preserved. 

contig of 109303 bp in length 
gap of unknown length 



contig of 15399 bp in length 
gap of unknown length 
contig of 12761 bp in length 



FEATURES 
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★ 
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Location/Qua 




1. 


.187901 



contig of 5688 bp in length 
gap of unknown length 
contig of 3374 bp in length 
gap of unknown length 
contig of 3063 bp in length. 



/organism="Homo sapiens" 
/db_xref="taxon: 9606" 
/ ch r omo s ome = " 1 2 q " 
/clone="RPll-503G7" 
38363 a 55880 c 57328 g 35726 t 



604 others 



ORIGIN 



Query Match 39.5%; Score 4067.2; DB 2; Length 187901; 

Best Local Similarity 87.5%; Pred. No. 0; 

Matches 4899; Conservative 0; Mismatches 61; Indels 637; Gaps 16; 

Qy 3037 gggccttccccagggacagccgatgctctcctgatggctcctgcccttgcagagtgctgc 3096 

| I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M M I I I I I I I I M I I I II I M I 

Db 175585 GGGCCTCCCCCAGGTACAGCCGATGCTCTCCTGATGGCTCCTGCCCTTGCAGAGTGCTGC 175644 

Qy 3097 ccccgcctgcccacctggcctggaccctcgcctgagccccctcagggctctgcgccacct 3156 

I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I M I I I I I I I 

Db 175645 CCCCGCCTGCCCACCTGGCCTGGACCCTCGCCTGAGCCCCCTCAGGGCTCTGCGCCACCT 175704 

Qy 3157 caacccaggcgtttgttccgcaggaacctcccggctcttcccactcgggaaaggaaggct 3216 

| M M I I I II I M II I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I 

Db 175705 CAACCCAGGCGTTTGTTCCGCAGGAACCTCCCGGCTCTTCCCACTCGGGAAAGGAAGGCT 175764 

Qy 3217 ctgggcatggaggtcggccaggccccatccccgtaccctggcccttcttcctgcttcctg 3276 

| | | || | | M I I I I I I I II I I I M II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1757 65 CTGGGCATGGAGGTCGGCCAGGCCCCATCCCCGTACCCTGCCCCTTCTTCCTGCTTCCTG 175824 

Qy 3277 tttgtcactgccccggggcctttgcacctgcattccctctctctgtgagtgtcctggggc 3336 

I I I I I I I I I I I I I II II I II I I I I I I I I I I I I I I I I I I I I I I I 

Db 175825 TTTGTCACTGCCCCGGGGCCTTTGCACCTGCATTCCCTCTCTCTGTGAGTGTCCTGGGGC 175884 

Qy 3337 ccgttacccacgtcaccgtcccaggataccttttcttttctttctctctctccagcttta 3396 

I I I I M I I I I I I I I I I I I I I I I I I I I I M I I I I II M I I I I I I I I M M I I I I II I I I II 
Db 17 58 85 CCGTTACCCACGTCACCGTCCCAGGATACCTTTTCTTTTCTTTCTCTCTCTCCAGCTTTA 175944 

Qy 3397 ttgaggtatagttgacaattcaggacggtgtgcactcaaggtatgcagcatcacaacctg 3456 

| I I I M I I I I I I I M I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
TTGAGGTATAGTTGACAATTCAGGACGGTGTGCACTCAAGGTATGCAGCATCACAACCTG 17 6004 

acacacgtaggcattgtgaaatgagtcccacaattgggctaattaacacacccatcacct 3516 

I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

ACACAC GT AGGCATT GT GAAAT GAGT C C CACAATT GGGCT AAT T AACACAC C CAT CAC CT 176064 

tacatggttacttctttctgtggtgagaacactaaattttaaatagaggacacacagcct 3576 

1 M M M I I I I i I I I I I I 1 I II I HI M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
TACATGGTTACTTCTTTCTGTGGTGAGAACACTAAATTTTAAATAGAGGACACACAGCCT 176124 

gggcaacatagtgagaccctgtctctacaaatataaaaaaattatctggacgtggtggtg 3636 
| | | I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
GGGCAACATAGT GAGACCCT GT CT CT ACAAAT AT AAAAAAATTAT CT GGACGT GGT GGT G 176184 

cacacctgtggtcccagctacttgggaagctgaggctggagaatcacttgagcctgggag 3696 

| | | | | | I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I 
CACACCTGTGGTCCCAGCTACTTGGGAAGCTGAGGCTGGAGAATCACTTGAGCCTGGGAG 17 6244 

gcggaggttgcggtgcactccagcctgggcgacagagggaggccctatctcaaaataaat 37 56 

I | | I I I I I I I I I I I I I I I I I I I II II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
GCGGAGGTTGCGGTGCACTCCAGCCTGGGCGACAGAGGGAGG-CCTATCTCA^iAATAAAT 17 6303 



Db 


175945 


Qy 


3457 


Db 


176005 


Qy 


3517 


Db 


176065 


Qy 


3577 


Db 


176125 


Qy 


3637 


Db 


176185 


Qy 


3697 


Db 


176245 



Qy 3757 aaataaaggacacattcttatcagctgtagtcaccacgttcattacatcttaqaacccgc 3816 
I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I 



Db 176304 AAATAAAGGACACATTCTTATCAGCTGTAGTCACCACGTTCATTACACCTTAGjKACCCGC 17 6363 

Qy 3817 taatctcataactgcacctttgttccctgtgaccctcaactcccggtcccctccagccct 3876 

M I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I 
Db 176364 TAATCTCATAACTGCACCTTTGTTCCCTGTGACCCTCAACTCCCGGTCCCCTCCAGCCCT 176423 

Qy 3877 gacagccactgttcactctgcttctgtgagttccgctttttcacacgtcactcgagtgag 3936 

| I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I II I I I I I I I I I I I I I I M I I I I I I I 
Db 176424 GACAGCCACTGTTCACTCTGCTTCTGTGAGTTCCGCTTTTTCACACGTCACTCGAGTGAG 176483 

Qy 3937 gccatgtgctgtttgtctttctgtgcctggcttatctcacttaccacaaatgc^cttcag 3996 

| || | | | I || I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I 
Db 176484 GCCATGTGCTGTTTGTCTTTCTGTGCCTGGCTTATCTCACTTACCACAAATGC3CTTCAG 176543 

Qy 3997 gttcatcgtgtcctcacaaatggcgggcttgccctgccctgccctgccctgccctccctt 4056 

M I I II I I I I I I II I I I I I I II I I I I I II I I I I I I I I I I I I II I I I I I I I 

Db 17 6544 GTTCATCGTGTCCTCACAAATGGCGGGCTTGCCCTGCCCTGCCCTGCCCTGCCCTCCCTT 17 6603 

Qy 4057 cccttcccttctctctctctcctttctctctctctggctctctctctct 4105 

M I I I I I I I I I I I II I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I 
Db 17 6604 CCCTTCCCTTCTCTCTCTCTCCTTTCTCTCTCTCTGTCTCTCTCTCTCTCCCCCCCTTCC 17 6663 

Qy 4106 4105 

Db 17 6664 CTTTTCCTCCTGTGGAATAACACTCCTGTATGTGTGTGTACGCATGTGTGTGTATACGCG 17 6723 

Qy 4106 4105 

Db 176724 T GT GT GT AC G CAT GT GT GT GT AT AC GT GT GT GT AC G CAT GT GT GT GT AT AC GT GT GT GT G 176783 

Qy 4106 4105 

Db 176784 TACGCATGTGTGTGTATTTCTTCCCTTCCCTCCCCTCCCCTTCCCTCCCCTCCCCTTCCC 176843 

Qy 4106 4105 

Db 176844 TCCCCTCCCCTTCCCTCCCCTCCCCTTCCCTCCCCTCCCCTTCTTTCCCCTCCCCTTCCC 176903 

Qy 4106 4105 

Db 176904 TTTCCCTCCCTGTGGAATAACACTCGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTA 176963 

Qy 4106 4105 

Db 176964 TGCATGTGTGTGTATTTCTCCCCTTCCCTTCCTTTCCCTCCCCTCCTCCCTCCCCTCCCT 177023 

Qy 4106 4105 

Db 177024 TCCCCTCCCCTTCCTTTTCCCTTCTGTGGAATAACGCTTGTGTGTGTGTGTGTGTATATA 177083 

Qy 4106 4105 

Db 177084 TGCATGTGTGTATATTTCTCCCTTTCCCTTCCTTTCCCTCCCCTCCTCCCTTCCCTCCCC 177143 

Qy 4106 4105 

Db 17714 4 TCCTCGGTCCCTTCCCCTCCCCTTCCCTTTCCCTCCCTGTGGGATAACACTCGTGTGTGT 177203 



Db 177204 GTGTGTGTGTGTGTGTATGCATGTGTGTGTATTTCTTCCCTCCCCTCCCCCCTCTCCTCC 177263 

Qy 4106 cccacccttccctttccctcctgtggaataacactcctgtgtgtgtgtgca 4156 

I I I I I I I I I I I I I I I I I I I I M I M M I I I I I I I I I I I I I M M I I I I I I I 
Db 177264 CCTCCCTTCCCCACCCTTCCCTTTCCCTCCTGTGGAATAACACTCCTGTGTGTGTGTGCA 177323 

Qy 4157 tgcatgtgtgtgtatatttctcacatattttcattcatgcatccgttgatggacacttgg 4216 

I I I I I II I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I 
Db 177324 T G CAT GT GT GT GT AT AT T T CT CAC AT AT T T T CAT T CAT GCAT C C GT T GAT GGACACT T G G 177383 

Qy 4217 gttgattccgtgtcctggctgctgggacagtgctgcgatgaacacgagggtacagacgcc 4276 

I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I I II I 
Db 177384 GTTGATTCCGTGTCCTGGCTGCTGGGACAGTGCTGCGATGAACACGAGGGTACAGACGCC 177443 

Qy 4277 tctcctacacgctaatttcaactctttggatatacacccagcagtgggattgctggatca 4 33 6 

I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 177444 TCT C CTACAC G CTAAT T T CAACT C T TT GGAT AT ACAC C CAG CAGT GGGAT T GC T GGAT CA 177503 

Qy 4337 ggtgggagctctatttccacatttttgaggaacctccctgccgtctcccatggtggctgt 4396 

I I I I I I I I I I II I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I 
Db 177504 GGTGGGAGCTCTATTTCCACATTTTTGAGGAACCTCCCTGCCGTCTCCCATGGTGGCTGT 177563 

Qy 4397 gccaacgacgttcccagggacagagtgcaacgggcccctttcctccatgtcctcgccaac 4456 

I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I II I I II I I I I I I II I II I I I I I I I I 
Db 177564 GCCAACGACGTTCCCAGGGACAGAGTGCAACGGGCCCCTTTCCTCCATGTCCTCGCCAAC 177 623 

Qy 4457 actcgctatcttttgcgttttgatgacagtcatcccaataggtgccagttggtacctcct 4516 

I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II 
Db 177624 ACTCGCTATCTTTTGCGTTTTGATGACAGTCATCCCAATAGGTGCCAGTTGGTACCTCCT 177683 

Qy 4517 gtggtttttatttgattttcctgatgattagtgatgctggacgttatttcgtctacactt 4576 

I I I I I I I I I I I I I I I I I I I M I I I I I I II I I I I II I I I I II I I I M I I I I I I I I I I I I I I 
Db 177684 GT GGT T TT T AT TT GATTTT C CT GAT GAT T AGT GAT GCT G GAC GT T ATT T C GT C T ACACT T 177743 

Qy 4577 cggccacttacatgttttccttcgagacacgcagattcaggtcctttgcacgt tttaaaa 4 636 

I I II I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 177744 CGGCCACTTACATGTTTTCCTTCGAGACACGCAGATTCAGGTCCTTTGCACGT TTTAAAA 177 8 03 

Qy 4637 ttttttttgtttgtttttgttattgagttgaattccttctacaatttgcaaattaactcc 4696 

I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 177804 TTTTTTTTGTTTGTTTTTGTTATTGAGTTGAATTCCTTCTACAATTTGCAAATTAACTCC 177863 

Qy 4697 tcatcatatacatggattgcaaatacccccgcctccccctggggttttgccttttcactg 4756 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I II I I I I I I I I I I I I I 
Db 177864 TCATCATATACATGGATTGCAAATACCCCCGCCTCCCCCTGGGGTTTTGCCTTTTCACTG 177923 

Qy 4757 caaatactcccgcctccccatgggggttgccttttccctgccaatacccccacctcccca 4816 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 177924 CAAATACTCCCGCCTCCCCATGGGGGTTGCCTTTTCCCTGCCAATACCCCCACCTCCCCA 177983 

Qy 4817 tgggggttgccttttccctgcaaatacccccacctccccgtgggttctgccttttccctg 4876 

II I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 177984 TGGGGGTTGCCTTTTCCCTGCAAATACCCCCACCTCCCCGTGGGTTCTGCCTTTTCCCTG 178043 



Qy 



4 877 ccaatacccccgcctccccctgggggttgccttttcactctgttggtttcctttgcggaa 4 936 

| | I" | | | M | | I I I I I I I I I I I I I II II I I I I M I I I I I I I I I I I I I I I I I M I I I II I M 
Db 178044 CCAATACCCCCGCCTCCCCCTGGGGGTTGCCTTTTCACTCTGTTGGTTTCCTTTGCGGAA 178103 



Qy 4937 gctttctggtttgttgcactctcactgtctatttttgcttctgttgcctgtgcttgtggg 4996 

| | | M I I I I I I I I II I M I I M I I I I II I I I I I I I I I I I M I I II M I I I I I I I I I I I II 
Db 178104 GCTTTCTGGTTTGTTGCACTCTCACTGTCTATTTTTGCTTCTGTTGCCTGTGCTTGTGGG 178163 

Qy 4997 gccatattttaaaaaaatcattgcccggaccagcctcaagaagttttcctcctcicgtttt 5056 

|| | | | | | | | | || || I II II I I I I II I I I I I I I I I I I I M I I I M I I I I I I I I I I I I I M I 
Db 178164 GCCATATTTTAAAAAAATCATTGCCCGGACCAGCCTCAAGAAGTTTTCCTCCTACGTTTT 178223 

Qy 5057 cttctaagagttttatggtgtcgggtcttaggtttgaatctttaatccgtgttgagttga 5116 

| || | || | || I I I I I I I I I I I I I M I M I I I I I I M II I I I I I I I I I I I I I I I I 

Db 178224 CTTCTAAGAGTTTTATGGTGTCGGGTCTTAGGTTTGAATCTTTAATCCGTGTTGAGTTGA 178283 

Qy 5117 ttttcgtaggtggtgtcggatgaggccctttcatcctcctccacttttcccagcaccacc 5176 

I | M I I I M M II M II I M I II I I I I I I I I I I I M I I I M M II II I I II I I ■ II I I I I 
Db 178284 TTTTCGTAGGTGGTGTCGGATGAGGCCCTTTCATCCTCCTCCACTTTTCCCAGCACCACC 17 8343 

Qy 5177 tattgaggatgcccctttccccgtcgtgtgtccttggcgcctttgctgaaggtcagttgg 5236 

MINIMI I I I I I M I M I I I I I I I I I I I I I I M I I M I M I I 

Db 178344 TATTGAGGATGCCCCTTTCCCCGTCGTGTGTCCTTGGCGCCTTTGCTGAAGGTCAGTTGG 178403 

Qy 5237 ccgtaactgtgcatggggacccttcctggcccccctggtgccctgtgccccatatgtccc 5296 

|| | | | | | | | | | I I I I I I II II I I I I I I I I I I I I I I M I II II I II I I I I I I I I M I I I I I 
Db 178404 CCGTAACTGTGCATGGGGACCCTTCCTGGCCCCCCTGGTGCCCTGTGCCCCATATGTCCC 178463 

Qy 5297 accccctcccttactttttctccatggcatgaatcaccccagacctactatacaaaattt 5356 

I II I I II II I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I IN 

Db 178464 AC CC C CT C C CT TACT T T TT CT C CAT G G CAT GAAT CAC CC CAGAC CT ACTATACATATTT T 178523 

Qy 5357 atcctatttatttttatttatttatttatttttgagatggagtctcactctgtcacccag 5416 

M I I I I I I I I I I I I I II I M I I I I II I I I I I M I I I I I I I I I I II I I I I I I I M II I I I 
Db 178524 AT CTT ATT T AT T T T TAT T TAT T TAT T T ATTTTT GAGAT G GAGT CT CACT CT GT CAC C CAG 178583 

Qy 5417 gctggagtgcagtggcacgatctcggctcactgcaagctccgcctcccaggttcacgcca 5476 

| || M | | I I M I I I I I I II II I I I I I I I I I I II I I I M I I I I I I I I I I I I I II I I I I I I I 
Db 178584 GCTGGAGTGCAGTGGCACGATCTCGGCTCACTGCAAGCTCCGCCTCCCAGGTTCACGCCA 178643 

Qy 5477 ttctcctgcctcagcctcccaagtagctgggattacgggcgcccgccaccatgcccggct 5536 

M II I I I I I I I I I II I I I I I M I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I 
Db 17 8 64 4 TTCTCCTGCCTCAGCCTCCCAAGTAGCTGGGATTACGGGCGCCCGCCACCATGCCCGGCT 178703 

Qy 5537 aaattttttgtttttttcgtagagacagagtttccctatgttgcccccaggttggtctcg 5596 

I | | || I I I I I I I I I I M I I I M I II I I II I I I I I I I I I I I I I I I I I I 

Db 178704 AATTTTTTTGTATTTTTAGTGGAGACGGAGTTTCAACATGTT--AGCCAGGATGGTCTCG 178761 

Qy 5597 aactcctgggctcaagtaatccttccacttcggcctcccaaagtgctgggattacaggca 5656 

| I I I I I I Ml II I I II I I I I I I I I I II I I I II M I II I I I I II I I I I 
Db 178762 ATCTCCTGACCTC— GTGATCCACCCGCCTCAGCCTCCCAAAGTGCTGGGATTATAGGCG 178819 

Qy 5657 tgagccattcggcccggcctattttttttttttcagacagagtttcactcttgtcaccca 5716 

MINIM N II II II II II I I I II II II II I I I I II II II I I I I I I I II I I I II I I 
Db 178 82 0 TGAGCCATCGCGCCCGGCCTATTTTTTTTTTTTCAGACAGAGTTTCACTCTTGTCACCCA 178879 



Qy 5717 ggctgaaatgcattgcaatgatcttggctcactgcaacttccacctcccaggttcaaagg 5776 



Mill I I I I I I I I I I I I I I II I I M I I I I I I I I I I I M I I I I I I I I I 

Db 178880 GGCTGGAGTGCAGTGGCATGATCTTGGCTCACTGCAACTTCCACCTCCCAGGTTCAAGCG 178 939 

Qy 5777 atttttctggcctcagcctcccgaggagctgggattacagtgtgcaaccaccacaccggg 5836 

Ml | || I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 178940 ATTCTCCT-GCCTCAGCCTCCCGAGTAGCTGGGATTACAGTGTGC-ACCACCACACCTGG 178997 

Qy 5837 ctaaaatttttggaattttttttttttactagagacagggttcaacaatgctggtcaggc 5896 

M II I I I I I II I II I I I I I I I M I I I I I I I I I I I I I I I I 

Db 17 8998 CT-AAATTTTTGTATTTTTTTTTTTTTACTAGAGACAGGGTTTCAACATGCTGGTCAGGC 179056 

Qy 5897 tggtctcgaattcctgacctcaagtgatcctcccacctcggcctcccaaagtgctgggat 5956 

M I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 179057 TGGTCTCGAATTCCTGACCTCAAGTGATCCTCCCACCTCGGCCTCCCAAAGTGCTGGGAT 179116 

Qy 5957 tacaggcgtgagccgccatgcctggccatggatattgtaaatgttcttgtttgttgtatg 6016 

I | | | | | | | I I I I I I I I I I I I I I I I I I I II I II I I M I I I I I I I I I I M M I II II I I I I I 
Db 17 9117 TACAGGCGTGAGCCGCCATGCCTGGCCATGGATATTGTAAATGTTCTTGTTTGTTGTATG 17 917 6 

Qy 6017 ttttcctcactgggctgtgcactcctgagggcggggcatctgtcccattcttcagtgctg 6076 

| | | M I I I II I I I M I I I II II I M I I I I I I I I I I I I I I II I I M I I I I I I I I I I I I I I I 
Db 17 9177 TTTTCCTCACTGGGCTGTGCACTCCTGAGGGCGGGGCATCTGTCCCATTCTTCAGTGCTG 179236 

Qy 6077 ggtcccctgtgtctgggacagtgtatacatacagcaggtgcataatcagtcttgactgga 6136 

| | || | | | | I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 179237 GGTCCCCTGTGTCT GGGACAGT GT AT ACAT ACAG CAG GT GCAT AAT CAGT CT T GACT GGA 179296 

Qy 6137 agggtgagggagtcaacgcacatggcagtcattggactatgtgtctgagaagcataactc 6196 

| | | || | || | I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I M II I I I I I II II 
Db 17 9297 AG GGT GAGGGAGT CAAC GCACAT GGCAGT CAT T GGACT AT GT GT CT GAGAAGC ATAACT C 179356 

Qy 6197 acttaatcttgaagttcacttatggattgaagtgtgcggttcagtgacttttaatatatt 6256 

I || I I I I I I I I I I I I M I II M I I I I I I I I I I I I I I I I I M I I I II I I I I I I I I I 

Db 17 9357 ACTTAAT CT T GAAGT T CACT T AT G GAT T GAAGT GT GC GGT T CAGT GACT TTT£ ATAT ATT 179416 

Qy 6257 taccgagttgtgtaaccatcaccaccatctaattttaaatcattttcatcatccctaaaa 6316 

I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 

Db 17 9417 TAC C GAGT T GT GTAAC CAT CAC CAC CAT CTAAT T T T AAAT CAT T T T CAT CAT C CCT AAAA 179476 

Qy 6317 gaaacttcagacccactagctgtccctccccctattcctcccaccccagccct.ggtcctg 6376 

II I I || I II I I I I I I II I I I I I II II M I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 17 94 77 GAAACTTCAGACCCACTAGCTGTCCCTCCCCCTATTCCTCCCACCCCAGCCCTGGTCCTG 179536 

Qy 6377 gccgcaggctgctcacctgcatctctctgtggatctgccggttgtggacatttcacacac 6436 

I | | | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I M I I I I I I I I I I I 

Db 179537 GCCGCAGGCTGCTCACCTGCATCTCTCTGTGGATCTGCCGGTTGTGGACATTTCACACAC 179596 

Qy 6437 ctgcgtgcagtcttctgtgcctgcctctttcactcgctgtgatgtttaagttcacccatg 6496 

II | | I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I 

Db 179597 CTGCGTGCAGTCTTCTGTGCCTGCCTCTTTCACTCGCTGTGATGTTTAAGTC"ACCCATG 17 9656 

Qy 6497 ttgtcatctatatcggtacttacttcctttttttttttggagatgaagtcttgctcttgt 6556 

I I I I I I I II I I I I I I I I II I I I I I I I I I I I I M I I I I I I M I I M I I I I I I I I II I I I I 
Db 179657 TTGTCATCTATATCGGGACTTACTTCCTTTTTTTTTTTGGAGATGAAGTCTTGCTCTTGT 179716 



Qy 6557 cacccaggctggagtgcagtggcgtgatctcggctcacagcaacttctgcctctggggtt 6616 
I I I I I I I M I I II II I I I I I M II I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 



Db 179717 CACCCAGGCTGGAGTGCAGTGGCGTGATCTCGGCTCACAGCAACTTCTGCCTCTGGGGTT 179776 



Qy 6617 caagtgattctcctgccttagcctcccaagtagctgggactacaggtttgcaccaccatg 6676 

| | | | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I II I I II I I I I I I I I I I I M 
Db 179777 CAAGTGATTCTCCTGCCTTAGCCTCCCAAGTAGCTGGGACTACAGGTTTGCACCACCATG 179836 

Qy 6677 tcctgctaatttttttttttttgtatttttaatagagacagggtttctcctcattggcca 6736 

| | | | M | | M I I II I I I I I II I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 17 9837 TCCTGCTAATTTTTTTTTTTTTGTATTTTTAATAGAGACAGGGTTTCTCCTCATTGGCCA 179896 

Qy 6737 ggctggtctcgaactcctgacctcagacgatccacctgcctcagcctcccgaagtgttgg 6796 

| | M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I 

Db 17 9897 GGCTGGTCTCGAACTCCTGACCTCAGACGATCCACCTGCCTCAGCCTCCCGAAGTGTTGG 179956 

Qy 6797 gattacaggcacgagccactgtgcccggccatcattcctttttactgctgactaatagtc 6856 

| | | | I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 17 9957 GAT T ACAG GCAC GAG C CAC T GTGC C C GGC CAT CAT T CCT T T TT ACT GCT GACT.&AT AGT C 18 0016 

Qy 6857 tgctgtgtgaatccaccgctagaaacccactcatcagttgatggtcatgtgggttgcttc 6916 

I I I I I I I I I M I I I I I I M I I I I I I I I I I I I I II I II I I I I I I I I I I I 

Db 180017 TGCTGTGTGAATCCACCGCTAGAAACCCACTCATCAGTTGATGGTCATGTGGGTTGCTTC 18007 6 

Qy 6917 tgctattcgcttattatgaacagtgctggaataaacgttcctgtgcactcttgggcatac 6976 

| | | | | | | | | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I M I I I I I I I I II I I I 
Db 180077 T GCT AT T C GCT TAT TAT GAACAGT GCT G GAATAAAC GTT C CT GT G CACT CT T G G G CAT AC 180136 

Qy 6977 gcctaggagtggaactgctgggtcaaatggtgactttacgtttaacgttctgaggagccg 7036 

| M | I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I N I I I I I I I 
Db 18 0137 GCCTAGGAGTGGAACTGCTGGGTCAAATGGTGACTTTACGTTTAACGTTCTGAGGAGCCG 180196 

Qy 7037 ccaggcgttttaacacagtgactgcaccatttcacattcctgccaacaatgtgtgagaat 7096 

M I I I M I I II I I I I M I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I II 

Db 18 0197 CCAG G C GTT T TAACACAGT GACT G CAC CAT T T CACAT T C CT GC CAACAAT GT GT GAGAAT 180256 

Qy 7097 tccaatttctctacatccccaacattttcctttaaaaaaaagaaaaaagaaacatagcca 7156 

M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I 
Db 180257 T C CAAT T T CT CT ACAT C C CCAACAT T T T CCT TTAAAAAAAAGAAAAAAGAAACAT AG C CA 180316 

Qy 7157 tctaagtggatgtggagcagactgtccctctggtttgggtttgcgttgcttttatggctc 7216 

II | | | II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I M I I I I I I I 

Db 18 0317 TCTAAGTGGATGTGGAGCAGACTGTCCCTCTGGTTTGGGTTTGCGTTGCTTTTATGGCTC 180376 

Qy 7217 atgatgtctgagtctctctccatgtgctcatggggattcgtatatctactttgggaaatg 7276 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I M I I I I I I II M I I I I I I I 
Db 180377 AT GAT GT CTGAGT CT CT CT CCAT GT GCT CAT GGGGATT CGTATAT CTACTTT GGGAAAT G 180436 

Qy 7277 cttattcaagtcctttgtccacatttgactgggttgcttgtctttttatttcatttacta 7336 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I 
Db 180437 CTTATTCAAGTCCTTTGTCCACATTTGACTGGGTTGCTTGTCTTTTTATTTCATTTACTA 18 0496 

Qy 7337 cgatgacagcccctacatggaaggattttgtttttgtaatcccattaccccgaggtgaga 7396 

I | M | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I 
Db 1804 97 C GAT GACAGCC C CT ACAT GGAAG GAT TTT GTTT T T GT AAT C C CATT AC C CC GAGGT GAGA 18 0556 

Qy 7397 atgaattgccagttgctcaaggccttcagctcttagggaggagcctggacctggagctgc 7456 

I | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M M I I I I I I M 
Db 180557 ATGAATTGCCAGTTGCTCAAGGCCTTCAGCTCTTAGGGAGGAGCCTGGACCTGGAGCTGC 18 0616 



Qy 7457 tccgggctctggcaaagctccaatcccggcctcagtccttgaggcctggtcctcacccag 7516 

| | I M I I I I I I M I I M I I M I I M I I I I I I I I I I I I M I I I M I I I I I I I I I I I I I I I 
Db 180617 TCCGGGCTCTGGCAAAGCTCCAATCCCGGCCTCAGTCCTTGAGGCCTGGTCCTCACCCAG 180676 

Qy 7517 ctttctccttccaccgtgccatggaggaagcccgacctccctgcacggctggcctggggt 7576 

1111111111 I I I I I I I II II I I I I I I I I I I I I I I I I I I I 

Db 18 0677 CTTTCTCCTTCCACCGTGCCATGGAGGAAGCACGACCTCCCTGCACGGCTGGCCTGGGGT 18 0736 

Qy 7577 tgttcacgactgagtccaggtgtccccagaacggatgtcactggtcacagtgttcctggt 7636 

| | M I II I II I I I I I I I I I I I I I I I I M I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I 
Db 180737 T GT T CAC GACT GAGT C CAG GT GT C C C CAGAACG GAT GT CACT GGT CACAGT GTT C CT GGT 180796 

Qy 763T7 aataggtgaccccaggcacagggtgttcctgatcataggtaacccaggcacaggtgtccc 7696 
I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I M M I I I I I I I M I I I I I I I I I M 



Db 


180797 


Qy 


7697 


Db 


180855 


Qy 


7757 


Db 


180913 


Qy 


7817 


Db 


180969 


Qy 


7877 


Db 


181029 


Qy 


7937 


Db 


181089 


Qy 


7997 


Db 


181148 



180854 



I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 



I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I I I 
-CCCAGGCACAGGTGT-CCTGGTCACAGATGT-CCCAGGCACAGGTGT-CCAGGCACAGG 180968 



| | | | | | | | I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I M I I I I I I I I I 
TGTCTCCAGGCACAGGCGTCCCAGGTCACAGGTGTNCCCGGTCACAGGTGTCC:CTGGTCA 181028 



| | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I M I I I I I I I I I I 

CAGGTGTCTCCAGGCACAGGTGTNCCTGGTCACAGGGTGTCCCGGTCACAGGTGTCCCAG 181088 



| I I I I I I I I I I I I I II I I I I I I I I I I M I I I II I I II I I I M I II I I I I I I I I I I I I I 



I II I I I I II I I I I I I I 
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AC079031 187950 bp DNA HTG 2S-JUN-2001 

Homo sapiens chromosome 12q clone RP11-503G7, WORKING DRAFT 
SEQUENCE, 8 unordered pieces. 
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2 (bases 1 to 187950) 
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Direct Submission 



Zorrilla, S . , Nelson, D. 



JOURNAL 



COMMENT 



Submitted ( 17-AUG-2000 ) Human Genome Sequencing Center, Department 
of Molecular and Human Genetics, Baylor College of Medicine, One 
Baylor Plaza, Houston, TX 77030, USA 

On Jun 8, 2001 this sequence version replaced gi: 13794207. 
Genome Center 

Center: Baylor College of Medicine 

Center code: BCM 

Web site: http://www.hgsc.bcm.tmc.edu/ 

Contact : hgsc-help@bcm. tmc . edu 
Project Information 

Center project name: HBQO 

Center clone name: RP11-503G7 
Summary Statistics 

Sequencing vector: Plasmid; M77789 

Sequencing vector: M13; L08821 

Chemistry: Dye-primer Bodipy: 36% of reads 

Chemistry: Dye-terminator Big Dye: 64% of reads 

Assembly program: Phrap; version 0.990329 

Consensus quality: 179703 bases at least Q40 

Consensus quality: 182695 bases at least Q30 

Consensus quality: 184221 bases at least Q20 

Estimated insert size: 184740; sum-of-contigs estimation 

Quality coverage: Ox in Q20 bases; agarose-fp estimation 

Quality coverage: 5 . lx in Q20 bases; sum-of-contigs estimation 



NOTE: Estimated insert size may differ from sequence length 

(see http : / /www . hgs c . bcm . tmc . edu/docs/Genbank_draf t_data . html ) 
NOTE: This is a 'working draft 1 sequence. It currently 
consists of 8 contigs . The true order of the pieces 
is not known and their order in this sequence record is 
arbitrary. Gaps between the contigs are represented as 
runs of N, but the exact sizes of the gaps are unknown. 
This record will be updated with the finished sequence 
as soon as it is available and the accession number will 
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1. 


.187950 



gap of unknown length 
contig of 12758 bp in length 
gap of unknown length 
contig of 3554 bp in length 
gap of unknown length 



/organism="Homo sapiens" 
/db_xref="taxon: 9606" 
/ ch r omo s ome = " 1 2 q " 
/clone= n RPll-503G7" 
37588 a 56082 c 57191 g 36388 



701 others 




ORIGIN 

Query Match 100.0%; Score 40; DB 2; Length 187950; 

Best Local Similarity 100.0%; Pred. No. 1.7e-05; 

Matches 40; Conservative 0; Mismatches 0; Indels 0; Gaps 

Qy l actacaggtttgcaccaccatgtcctgctaattttttttt 40 

I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I M I I I I I I I 
Db 160697 ACTACAGGTTTGCACCACCATGTCCTGCTAATTTTTTTTT 160658 



RESULT 3 

HSU14383 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
MEDLINE 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 
FEATURES 

source 



HSU14383 1403 bp mRNA 

Human mucin (MUC8) mRNA, partial cds . 
U14383 U04799 
U14383.1 GI:606953 



linear PRI 31-DEC-1994 



human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 907 to 1403) 
Sachdev,G.P. 

Carboxy terminus of a major human tracheobronchial mucin MUC8 
Unpublished 

2 (bases 1 to 941) 

Shankar,V., Gilmore, M. S . , Elkins,R.C. and Sachdev,G.P. 

A novel human airway mucin cDNA encodes a protein with unique 

tandem-repeat organization 

Biochem. J. 300 (Pt 2), 295-298 (1994) 

94271137 

3 (bases 1 to 1403) 
Sachdev, G. P. 
Direct Submission 

Submitted ( 03-SEP-1994 ) Goverdhan P. Sachdev, University of 
Oklahoma Health Sciences Center, Medicinal Chemistry, College of 
Pharmacy, 1110 N. Stonewall Avenue, Oklahoma City, OK 73117, USA 
On Jan 1, 1995 this sequence version replaced gi: 537430. 
Location/ Qualifiers 
1. .1403 

/organism="Homo sapiens" 
/db_xref="taxon: 9606" 
/ chromosome="12" 
/clone="PAM2" 

/tissue type=" trachea /bronchus" 
/dev_stage="adult" 
<1. .892 

/ rp t__type=t andem 
/rpt_unit=3. .43 
1. .1403 
/gene="MUC8" 
<1. .944 
/gene= M MUC8" 
/codon_start=3 
/product= "mucin" 
/protein__id="AAA5834 6.1" 
/db_xref= M GI: 501033" 

/translation="TSCPRPLQEGTPGSRAAHALSRRGHRVHELPTSSPGGDTGFMSC 
PRPFQEGTPGSRAAHVLSRKGPRVHELPTSSPGRDPGFTSCPRPLQEGTRVTNCPRPL 
QEGTPGSRAAHVLSRRGHRVHELPTPSPGRDPGFMSCPRPLQEGTRVTNCPRPLQEGT 
RVTSCPRRLQEGTRVTSCPRPLQEGTRVTNCPRALQEGTPGSRAAHALSRKGPRVHEL 
PTSSPGGDTGFTSCPRPLQEGTPGSRAAHALSRRGHRVHELPTSSPGRDPGHELPTSS 
PGGDTGFTSCPRTFQEGTPGSGLLPAHIVPLCKSEER" 
945. .1403 
/gene="MUC8" 
1350. .1355 
/gene="MUC8" 



repeat_region 

gene 
CDS 



3 'UTR 

polyA_signal 



polyA_site 1403 

/gene="MUC8" 
/note="17 A residues" 
254 a 490 c 413 g 246 t 



BASE COUNT 
ORIGIN 



Query Match 9.4%; Score 964.6; DB 9; Length 1403; 

Best Local Similarity 95.6%; Pred. No. 2.1e-137; 

Matches 1100; Conservative 0; Mismatches 39; Indels 12; 



Gaps 



10 



Qy 


1482 


Db 


2 


Qy 


1542 


Db 


62 


Qy 


1600 


Db 


122 


Qy 


1660 


Db 


182 


Qy 


1720 


Db 


242 


Qy 


1780 


Db 


300 


Qy 


1840 


Db 


360 


Qy 


1900 


Db 


420 


Qy 


1960 


Db 


479 


Qy 


2020 


Db 


537 


Qy 


2080 


Db 


596 



cacgagctgcccacgtcctctccaggaagggaccccgggttcacgagctgcccacgtcgt 1541 

I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I 

CACGAGCTGCCCACGTCCTCTCCAGGAGGGGACACCGGGTTCACGAGCTGCCCACGCCCT 61 

ctccaggaagggac-ccgggtccacgagctgcccacgtcctctccaggaaaggac-ccgg 1599 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 
CTCCAGGAGGGGACACCGGGTTCACGAGCTGCCCACGTCCTCTCCAGGAGGGGACACCGG 12 1 



|| M | | | | I I I I I I I III I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 
GTTCATGAGCTGCCCACGCCCTTTCCAGGAAGGGACCCCGGGTTCACGAGCTGCCCACGT 

cctctccaggaagggaccccgggttcacgagctgcccacgtcctctccaggaagggaccc 

I I || I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I 

CCTCTCCAGGAAGGGACCCCGGGTTCACGAGCTGCCCACGTCCTCTCCAGGAAGGGACCC 



1659 



181 



1719 



241 



1779 



I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I II 
CGGGTTCACGAGCTGCCCACGTCCTCTCCAGGAAGGGACCC- 



II I II I I I I I I I I I I 
-GGGTCACGAACTGCCCA 



299 



I | I I I I I I I I I I I I I II II II I I I I I I I I I I II I I II I I M I I M I M I I I I I I I I I I I I 
CGTCCTCTCCAGGAAGGGACCCCGGGTTCACGAGCTGCCCACGTCCTCTCCAGGAGGGGA 359 



I | | | | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I 
CACCGGGTTCACGAGCTGCCCACGCCCTCTCCAGGAAGGGACCCCGGGTTCATGAGCTGC 



II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
CCACGTCCTCTCCAGGAAGGGACCCGGGT- 



I I II I I I I I I I I I I I I I I I I II I I I I I I I I 
-CACGAACTGCCCACGCCCTCTCCAGGAGGG 



MINIMI I II I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I II I I 

gacccgggt-cacgagctgcccacgtcgtctccaggaagggacccgggt-cacgagctgc 



II 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

CCACGTCCTCTCCAGGAAGGGACCCGGGT - 



II I I I I I I I I I I I I I I I I I I I M I I I II I I 

-CACGAACTGCCCACGCGCTCTCCAGGAGGG 



419 



478 



536 



595 



I I I I I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I . I I I I I I I 



Qy 214 0 gcccacgtcctctccaggaggggacaccgggttcacgagctgcccacgtcctctccagga 2199 
I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I II I I I I I I I I I I I I I I I I I II I II II I I 



Db 656 GCCCACGTCCTCTCCAGGAGGGGACACCGGGTTCACGAGCTGCCCACGTCCTCTCCAGGA 715 

ggggacaccgggttcacgagctgcccacgccctctccaggaggggacaccgggttcacga 22 5 < 
I I I I I I I I I I I I I I I I I I I I I I I I I II II M I M I M I I I I I I I I I I I I 

GGGGACACCGGGTTCACGAGCTGCCCACGCCCTCTCCAGGAGGGGACACCGGG"TCACGA 775 

gctgcccacgtcctctccaggaagggacccgggtccacgagctgcccacgtcctctccag 231S 

| || || | I I I II I I I I I I I i I I I I II I I I M I I I I I I I I I I I I I I I II I I I I I I I I I I I I 
GCTGCCCACGTCCTCTCCAGGAAGGGACCCGGGT-CACGAGCTGCCCACGTCCTCTCCAG 834 

gaggggacaccgggttcacgagctgcccacgcactttccaggaagggaccccgggttcag 237 5 
I I I I I I I I I I I I II I I I M I I I I II I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 
GAGGGGACACCGGGTTCACGAGCTGCCCACGCACTTTCCAGGAAGGGACCCCGGGTTCAG 894 

gtctcctgccggcccacatcgtgcctttgtgtaaatcagaagaaagatgaggaacaggcc 243! 

I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
GTCTCCTGCCGGCCCACATCGTGCCTTTGTGTA7^ATCAGAAGAAAGATGAGGA\CAGGCC 954 

ctcctctctctccaggcaggctttggtggaggggctggatctcctgccgcaccttccctg 24 9! 
I I M I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I 
CTCCTCTCTCTCCAGGCAGGCTTTGGTGGAGGGGCTGGATCTCCTGCCGCACCTTCCCTG 101 

gcagggcaccctgtgcttgagccccagaactgcaggcggccggcagagaaggggtccatg 255 
Ml I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I II 



I I I I I I I I I I I I I II I I II II I I I I I M I I I I I I I I I I I I I I I Ml 

tTGGCGCCTCGGTGCG — GCCTTGGACCTGCCCCCATGGACCTGGAGACAGGGTTTCTCC 1131 



Db 


656 


Qy 


2200 


Db 


716 


Qy 


2260 


Db 


776 


Qy 


2320 


Db 


835 


Qy 


2380 


Db 


895 


Qy 


2440 


Db 


955 


Qy 


2500 


Db 


1015 


yy 


9 ^ fin 


Db 


1074 


Qy 


2620 


Db 


1132 



III III 



HSU14383 
LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
MEDLINE 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 
FEATURES 

source 



HSU14383 1403 bp mRNA 

Human mucin (MUC8) mRNA, partial cds . 
U14383 U04799 
U14383. 1 GI : 606953 



linear PRI 31-DEC-1994 



repeat_region 

gene 
CDS 



3 ! UTR 

polyA_signal 
polyA_ 



human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae ; Homo. 

1 (bases 907 to 1403) 
Sachdev,G.P. 

Carboxy terminus of a major human tracheobronchial mucin MUC8 
Unpublished 

2 (bases 1 to 941) 

Shankar,V., Gilmore, M. S . , Elkins,R.C. and Sachdev,G.P. 

A novel human airway mucin cDNA encodes a protein with unique 

tandem-repeat organization 

Biochem. J. 300 (Pt 2), 295-298 (1994) 

94271137 

3 (bases 1 to 1403) 
Sachdev, G. P. 
Direct Submission 

Submitted ( 03-SEP-1994 ) Goverdhan P. Sachdev, University of 
Oklahoma Health Sciences Center, Medicinal Chemistry, College of 
Pharmacy, 1110 N . Stonewall Avenue, Oklahoma City, DK 73117, USA 
On Jan 1, 1995 this sequence version replaced gi: 537430. 

Location/Qualifiers 

1. .1403 

/organism="Homo sapiens" 
/db_xref="taxon: 9606" 
/ chromosome="12" 
/clone="PAM2" 

/tissue_type=" trachea/bronchus" 
/dev_stage="adult" 
<1. .892 

/rpt_type=tandem 
/rpt_unit=3. .43 
1. .1403 
/gene="MUC8" 
<1. .944 
/gene= H MUC8" 
/codon_start=3 
/product="mucin" 
/protein_id="AAA58346. 1" 
/db_xref="GI: 501033" 

/trans la tion="TSCPRPLQEGTPGSRAAHALSRRGHRVHELPTSSPGGDTGFMSC 
PRPFQEGTPGSRAAHVLSRKGPRVHELPTSSPGRDPGFTSCPRPLQEGTRVTNCPRPL 
QEGTPGSRAAHVLSRRGHRVHELPTPSPGRDPGFMSCPRPLQEGTRVTNCPRPLQEGT 
RWSCPRRLQEGTRVTSCPRPLQEGTRWNCPRALQEGTPGSRAAHALSRKGPRVHEL 
PTSSPGGDTGFTSCPRPLQEGTPGSRAAHALSRRGHRVHELETSSPGRDPGHELPTSS 
PGGDTGFTSCPRTFQEGTPGSGLLPAHIVPLCKSEER" 
945. .1403 
/gene="MUC8" 
1350. .1355 
/gene="MUC8" 
1403 



site 



/gene="MUC8" 
/note="17 A residues" 

BASE COUNT 254 a 490 c 413 g 246 

ORIGIN 



Query Match 67.8%; Score 1072.4; DB 9; Length 1403; 

Best Local Similarity 88.4%; Pred. No. 1.9e-225; 

Matches 1373; Conservative 0; Mismatches 26; Indels 155; Gaps 13; 

Qy 2 cacgagctgcccacgtcctctccaggaagggaccccgggttcacgagctgcccacgtcgt 61 

I I I I I M I I I I I I I I I I I I I I I I M I I I I I I I I I I II I I I I I I I M I M I I 

Db 2 CACGAGCTGCCCACGTCCTCTCCAGGAGGGGACACCGGGTTCACGAGCTGCCCACGCCCT 61 

Qy 62 ctccaggaagggac-ccgggtccacgagctgcccacgtcctctccaggaaaggac-ccgg 119 

I I I I I I M I II I I I I I I I I I I I I I I I M M I I I I I I I I I I M I I II I I II I I I I 

Db 62 CTCCAGGAGGGGACACCGGGTTCACGAGCTGCCCACGTCCTCTCCAGGAGGGGACACCGG 121 

Qy 120 gtccacgagctggccacgtcctctgcaggaagggaccccgggtccacgagctgcccacgt 179 

II || | I I I I I I I I I I III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M 

Db 122 GTTCATGAGCTGCCCACGCCCTTTCCAGGAAGGGACCCCGGGTTCACGAGCTGCCCACGT 181 

Qy 18 0 cctctccaggaagggaccccgggttcacgagctgcccacgtcctctccaggaagggaccc 239 

I | | | I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I M I I II 
Db 182 CCTCTCCAGGAAGGGACCCCGGGTTCACGAGCTGCCCACGTCCTCTCCAGGAAGGGACCC 241 

Qy 240 cgggtccacgagctgcccacgtcctctccaggaagggaccccgggtccacgaactgccca 299 

I I I M I I I I I I I I II I I I I I I I I II I I I I I II I II I I I I I II I I I I I I I I I I I I I 
Db 242 CGGGTTCACGAGCTGCCCACGTCCTCTCCAGGAAGGGACCC--GGGTCACGAACTGCCCA 299 

Qy 300 cgtcctctccaggaagggaccccgggttcacgagctgcccacgtcctctccaggagggga 359 

I I I I I I I I I I I I II I I I I I Ml I I I I I I I 

Db 300 CGTCCTCTCCAGGAAGGGACCCCGGGTTCACGAGCTGCCCACGTCCTCTCCAGGAGGGGA 359 

Qy 360 caccgggttcacgagctgcccacgccctctccaggaagggaccccgggttcatgagctgc 419 

I | | I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I M II I 
Db 360 CACCGGGTTCACGAGCTGCCCACGCCCTCTCCAGGAAGGGACCCCGGGTTCAT'GAGCTGC 419 

Qy 420 ccacgtcctctccaggaagggacccgggtccacgaactgcccacgccctctccaggaggg 479 

I I I I I I I I I I II I I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 420 CCACGTCCTCTCCAGGAAGGGACCCGGGT-CACGAACTGCCCACGCCCTCTCCAGGAGGG 478 

Qy 480 gacccgggtccacgagctgcccacgtcgtcaacgggaagggacccgggtccacgagctgc 539 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 479 GACCCGGGT-CACGAGCTGCCCACGTCGTCTCCAGGAAGGGACCCGGGT-CAC:GAGCTGC 536 

Qy 540 ccacgtcctctccaggaagggacccgggtccacgaactgcccacgcgctctccaggaggg 599 

I M I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 537 CCACGTCCTCTCCAGGAAGGGACCCGGGT-CACGAACTGCCCACGCGCTCTCOAGGAGGG 595 

Qy 600 gacaccgggttcacgagctgcccacgccctctccaggaagggaccccgggttcacgagct 659 

Ml I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 596 GACACCGGGTTCACGAGCTGCCCACGCCCTCTCCAGGAAGGGACCCCGGGTTCACGAGCT 655 



Qy 

Db 



660 gcccacgtcctctccaggaggggacaccgggttcacgagctgcccacgtcctctccagga 719 

I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I 
656 GCCCACGTCCTCTCCAGGAGGGGACACCGGGTTCACGAGCTGCCCACGTCCTCTCCAGGA 715 



Qy 


720 ■ 


Db 


716 


Qy 


780 


Db 


776 


Qy 


840 


Db 


835 


Qy 


900 


Db 


895 


Qy 


960 


Db 


955 


Qy 


1020 


Db 


1015 


Qy 


1080 


Db 


1074 


Qy 


1140 


Db 


1117 


Qy 


1200 


Db 


1117 


Qy 


1260 


Db 


1117 


Qy 


1320 


Db 


1171 


Qy 


1380 


Db 


1230 


Qy 


1440 


Db 


1290 


Qy 


1500 


Db 


1350 



1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 

IGGGACACCGGGTTCACGAGCTGCCCACGCCCTCTCCAGGAGGGGACACCGGGTTCACGA 

[Ctgcccacgtcctctccaggaagggacccgggtccacgagctgcccacgtcctctccag 

I I I I I I M I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 
;CTGCCCACGTCCTCTCCAGGAAGGGACCCGGGT-CACGAGCTGCCCACGTCCTCTCCAG 

jaggggacaccgggttcacgagctgcccacgcactttccaggaagggaccccgggttcag 

I M I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
JAGGGGACACCGGGTTCACGAGCTGCCCACGCACTTTCCAGGAAGGGACCCCGGGTTCAG 

jtctcctgccggcccacatcgtgcctttgtgtaaatcagaagaaagatgaggaacaggcc 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 



ctcctctctctccaggcaggctttggtggaggggctggatctcctgccgcaccttccctg 1019 

I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
CTCCTCTCTCTCCAGGCAGGCTTTGGTGGAGGGGCTGGATCTCCTGCCGCACCTTCCCTG 1014 



I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 



I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

ATGGCGCCTCGGTGCG — GCCTTGGACCTGCCCCCATGGACCTGG 1116 



I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 



1116 



1116 



I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 



I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I 



I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 



I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I . I 



RESULT 2 

HSU14383 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
MEDLINE 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 
FEATURES 

source 



3 'UTR 
polyA 



HSU14383 1403 bp mRNA linear PRI 31-DEC-1994 107. 

Human mucin (MUC8) mRNA, partial cds . 
U14383 U04799 
U14383.1 GI:606953 

human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 907 to 1403) 
Sachdev,G. P. 

Carboxy terminus of a major human tracheobronchial mucin MUC8 
Unpublished 

2 (bases 1 to 941) 

Shankar,V., Gilmore,M. S . , Elkins,R.C. and Sachdev, G . P . 

A novel human airway mucin cDNA encodes a protein with unique 

tandem-repeat organization 

Biochem. J. 300 (Pt 2), 295-298 (1994) 

94271137 

3 (bases 1 to 1403) 
Sachdev, G. P. 
Direct Submission 

Submitted ( 03-SEP-1994 ) Goverdhan P. Sachdev, University of 
Oklahoma Health Sciences Center, Medicinal Chemistry, College of 
Pharmacy, 1110 N. Stonewall Avenue, Oklahoma City, OK 73117, USA 
On Jan 1, 1995 this sequence version replaced gi: 537430. 
Location/Qualifiers 
1. .1403 

/organism-"Homo sapiens" 
/db_xref="taxon: 9606" 
/ ch r omo s ome= "12" 
/clone="PAM2 n 

/tissue_type=" trachea/bronchus" 
/dev_stage="adult" 
<1. .892 

/ rpt_type=tandem 
/rpt_unit=3. .43 
1. .1403 
/gene="MUC8" 
<1. .944 
/gene="MUC8" 
/ codon_start=3 
/product= "mucin" 
/protein_id="AAA58346. 1" 
/db_xref="GI: 501033" 

/ trans lation="TSCPRPLQEGTPGSRAAHALSRRGHRVHELPTSSPGGDTGFMSC 
PRPFQEGTPGSRAAHVLSRKGPRVHELPTSSPGRDPGFTSCPRPLQEGTRVTNCPRPL 
QEGTPGSRAAHVLSRRGHRVHELPTPSPGRDPGFMSCPRPLQEGTRVTNCPRPLQEGT 
RVTSCPRRLQEGTRVTSCPRPLQEGTRVTNCPRALQEGTPGSRAAHALSRKGPRVHEL 
PTSSPGGDTGFTSCPRPLQEGTPGSRAAHALSRRGHRVHELPTSSPGRDPGHELPTSS 
PGGDTGFTSCPRTFQEGTPGSGLLPAHIVPLCKSEER" 
945. .1403 
/gene="MUC8" 
signal 1350. .1355 
/gene="MUC8" 



repeat_region 

gene 
CDS 



polyA_site 1403 

/gene="MUC8 M 
/note="17 A residues" 

BASE COUNT 254 a 490 c 413 g 246 t 

ORIGIN 



Query Match 21.1%; Score 334; DB 9; Length 1403; 

Best Local Similarity 99.8%; Pred. No. 2.8e-181; 

Matches 454; Conservative 0; Mismatches 0; Indels 1; Gaps 

Qy 570 cacgaactgcccacgcgctctccaggaggggacaccgggttcacgagctgcccacgccct 629 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I 
Db 566 CACGAACTGCCCACGCGCTCTCCAGGAGGGGACACCGGGTTCACGAGCTGCCCACGCCCT 625 

Qy 630 ctccaggaagggaccccgggttcacgagctgcccacgtcctctccaggaggggacaccgg 689 

I I I I I I I I I I I I I I M I I I I M I I I I I I I I I I I I I I I I II I I I I I I II M I II I M I I I I 
Db 62 6 CTCCAGGAAGGGACCCCGGGTTCACGAGCTGCCCACGTCCTCTCCAGGAGGGGACACCGG 685 

Qy 690 gttcacgagctgcccacgtcctctccaggaggggacaccgggttcacgagctgcccacgc 749 

I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 686 GTTCACGAGCTGCCCACGTCCTCTCCAGGAGGGGACACCGGGTTCACGAGCTGCCCACGC 745 

Qy 750 cctctccaggaggggacaccgggttcacgagctgcccacgtcctctccaggaagggaccc 809 

II II I I I I I I I I I I I I I I I I I II M I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I I I I 

Db 746 CCTCTCCAGGAGGGGACACCGGGTTCACGAGCTGCCCACGTCCTCTCCAGGAAGGGACCC 805 

Qy 810 gggtccacgagctgcccacgtcctctccaggaggggacaccgggttcacgagctgcccac 869 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I M I I M I M II I I I I 
Db 806 GGGT-CACGAGCTGCCCACGTCCTCTCCAGGAGGGGACACCGGGTTCACGAGCTGCCCAC 864 

Qy 870 gcactttccaggaagggaccccgggttcaggtctcctgccggcccacatcgtgcctttgt 929 

I I I I I I I I I I I I I ! I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I 

Db 865 GCACTTTCCAGGAAGGGACCCCGGGTTCAGGTCTCCTGCCGGCCCACATCGTGCCTTTGT 924 

Qy 930 gtaaatcagaagaaagatgaggaacaggccctcctctctctccaggcaggctttggtgga 989 

II II I I I I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 925 GTAAATCAGAAGAAAGATGAGGAACAGGCCCTCCTCTCTCTCCAGGCAGGCTTTGGTGGA 984 



Qy 

Db 



990 ggggctggatctcctgccgcaccttccctggcagg 1024 

I II I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I 
985 GGGGCTGGATCTCCTGCCGCACCTTCCCTGGCAGG 1019 



LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
MEDLINE 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



linear PRI 31-DEC-1994 



COMMENT 
FEATURES 

source 



HSU14383 1403 bp mRNA 

Human mucin (MUC8) mRNA, partial cds . 
U14383 U04799 
U14383.1 GI:606953 



human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 907 to 1403) 
Sachdev, G. P . 

Carboxy terminus of a major human tracheobronchial mucin MUC8 
Unpublished 

2 (bases 1 to 941) 

Shankar,V., Gilmore, M. S . , Elkins,R.C. and Sachdev, G. P. 

A novel human airway mucin cDNA encodes a protein with unique 

tandem-repeat organization 

Biochem. J. 300 (Pt 2), 295-298 (1994) 

94271137 

3 (bases 1 to 1403) 
Sachdev, G. P. 
Direct Submission 

Submitted ( 03-SEP-1994 ) Goverdhan P. Sachdev, University of 
Oklahoma Health Sciences Center, Medicinal Chemistry, College of 
Pharmacy, 1110 N. Stonewall Avenue, Oklahoma City, OK 73117, USA 
On Jan 1, 1995 this sequence version replaced gi: 537430. 

Location/Qualifiers 

1. .1403 

/organism="Homo sapiens 11 
/db_xref= n taxon: 9606" 
/ ch r omo s ome = " 1 2 11 
/clone="PAM2" 

/tissue_type=" trachea/bronchus 11 
/dev_stage="adult" 
<1. .892 

/rpt type=tandem 
/rpt"unit^3. .43 
1. .1403 
/gene= n MUC8" 
<1. .944 
/gene= n MUC8" 
/codon_start=3 
/product= "mucin" 
/protein_id="AAA5834 6. 1" 
/db_xref-"GI : 501033" 

/trans la tion="TSCPRPLQEGTPGSRAAHALSRRGHRVHELPTSSPGGDTGFMSC 
PRPFQEGTPGSRAAHVLSRKGPRVHELPTSSPGRDPGFTSCPRPLQEGTRVTNCPRPL 
QEGTPGSRAAHVLSRRGHRVHELPTPSPGRDPGFMSCPRPLgEGTRVTNCPRPLQEGT 
RVTSCPRRLQEGTRVTSCPRPLQEGTRVTNCPRALQEGTPGSRAAHALSRKGPRVHEL 
PTSSPGGDTGFTSCPRPLQEGTPGSRAAHALSRRGHRVHELPTSSPGRDPGHELPTSS 
PGGDTGFTSCPRTFQEGTPGSGLLPAHIVPLCKSEER" 
945. .1403 
/gene="MUC8" 
1350. .1355 
/gene="MUC8" 
1403 

/gene="MUC8" 



repeat_region 

gene 
CDS 



3'UTR 

polyA_signal 
polyA_site 



/note="17 A residues" 
BASE COUNT 254 a 490 c 413 g 246 t 

ORIGIN 



alignment_scores : 

Quality: 1751.50 Length: 439 

Ratio: 4.658 Gaps: 7 

Percent Similarity: 85.649 Percent Identity: 84.282 

alignment_block : 
US-09-627-465B-3 x HSU14383 

Align seg 1/1 to: HSU14383 from: 1 to: 1403 

1 ThrSerCysProArgProLeuGlnGluGlyThrProGlySerArgAlaAl 17 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I M II I II I I I I I I I I I I 
3 ACGAGCTGCCCACGTCCTCTCCAGGAGGGGACACCGGGTTCACGAGCTGC 52 

17 aHisValValSerArgLysGly. ProGlySerThrSerCysProArgPro 33 

| | | | :::::: I I I I I I ::: I I I III I I I I I I I I I I II I I I I I I 

53 CCACGCCCTCTCCAGGAGGGGACACCGGGTTCACGAGCTGCCCACGTCCT 102 

34 LeuGlnGluArgThr . ArgValHisGluLeuAlaThrSerSerAlaGlyA 50 

I I I I I I I I I I I I I I I I I I I I I I I I I I I III I I I I 

103 CTCCAGGAGGGGACACCGGGTTCATGAGCTGCCCACGCCCTTTCCAGGAA 152 

50 rgAspProGlySerThrSerCysProArgProLeuGlnGluGlyThrPro 66 

I M I I I I I I I I I I I I I I I I I I I I I I I II I I I M I I I I I I I I I I I I I I 

153 GGGACCCCGGGTTCACGAGCTGCCCACGTCCTCTCCAGGAAGGGACCCCG 202 

67 GlySerArgAlaAlaHisValLeuSerArgLysGlyProArgValHisGl 83 

II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 

203 GGTTCACGAGCTGCCCACGTCCTCTCCAGGAAGGGACCCCGGGTTCACGA 252 

83 uLeuProThrSerSerProGlyArgAspProGly. SerThrAsnCysPro 99 

I I I I II II I II I I I I I I I I I I I I I I I I M I I I I I ::: I I I I I I I I I 
253 GCTGCCCACGTCCTCTCCAGGAAGGGACCCGGGTCACG. . .AACTGCCCA 29$ 

100 ArgProLeuGlnGluGlyThrProGlySerArgAlaAlaHisValLeuSe 116 

M I I I II I II II I I I I I I I I I II I M I I I I I I I I I I I I I I I I I I I M I I I 
300 CGTCCTCTCCAGGAAGGGACCCCGGGTTCACGAGCTGCCCACGTCCTCTC 349 

116 rArgArgGlyHisArgValHisGluLeuProThrProSerProGlyArgA 133 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 

350 CAGGAGGGGACACCGGGTTCACGAGCTGCCCACGCCCTCTCCAGGAAGGG 399 

133 spProGlyPheMetSerCysProArgProLeuGlnGluGlyThrArgVal 149 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

4 00 ACCCCGGGTTCATGAGCTGCCCACGTCCTCTCCAGGAAGGGACCCGGGT . 44 8 

150 HisGluLeuProThrProSerProGlyGlyAspProGlyProArgAlaAl 166 

I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I 
449 CACGAACTGCCCACGCCCTCTCCAGGAGGGGACCCGGGTCA . CGAGCTGC 49'' 



166 aHisValValAsnGlyLysGlyProGlySerThrSerCysProArgProL 183 
I M I I I I I I I :: : I II II II I I I I I I I I I I I I I I M I I I I I I I I 



498 CCACGTCGTCTCCAGGAAGGGACCCGGGTC . ACGAGCTGCCCACGTCCTC 54 6 



183 euGlnGluGlyThrArgValHisGluLeuProThrArgSerProGlyGly 199 

| | | | M II I I I II II I I I I I I I I II I I I I I I I I I I I I II I I I I I I I I 
547 TCCAGGAAGGGACCCGGGT . CACGAACTGCCCACGCGCTCTCCAGGAGGG 595 

200 AspThrGlyPheThrSerCysProArgProLeuGlnGluGlyThrProGl 216 

I I I I I I I I I I I I I I I I I M I I I I I I I I I I I II I I I I I I I I 

596 GACAC C GGGT T CAC GAGCT GC CCAC GCCCTCTC CAGGAAG G GAC CC C G G G 645 

216 ySerArgAlaAlaHisValLeuSerArgArgGlyHisArgValHisGluL 233 

I ] I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M M I I I I I ! I I 
646 TTCACGAGCTGCCCACGTCCTCTCCAGGAGGGGACACCGGGTTCACGAGC 695 

233 euProThrSerSerProGlyGlyAspThrGlyPheThrSerCysProArg 249 

| | | | | | | I I I I I I I I I I I I I I I I I I I II I I I I M I I I I I I I I I I I I I I II 
696 TGCCCACGTCCTCTCCAGGAGGGGACACCGGGTTCACGAGCTGCCCACGC 745 

250 ProLeuGlnGluGlyThrProGlySerArgAlaAlaHisValLeuSerAr 266 

| I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I M M I I I I I I II I 
746 CCTCTCCAGGAGGGGACACCGGGTTCACGAGCTGCCCACGTCCTCTCCAG 795 

266 gLysGlyProGlySerThrSerCysProArgProLeuGlnGluGlyThrP 283 

| | | | | I I I I I I I I I II II I I I I I I II I I M II II II I I I I I I I I I I I 
796 GAAGGGACCCGGGTC . ACGAGCTGCCCACGTCCTCTCCAGGAGGGGACAC 844 

2 83 roGlySerArgAlaAlaHisAlaLeuSerArgLysGlyProArgValGln 299 

I I I I I I I I I I I I I I I I I I I II I I I I I I I M I I I I I I I I I I I I I I I I I I I I 
8 45 CGGGTTCACGAGCTGCCCACGCACTTTCCAGGAAGGGACCCCGGGTTCAG 894 

300 ValSerCysArgProThrSerCysLeuCysValAsnGlnLysLysAspGl 316 

I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M 
895 GT CT C CT GCC GG C CCACAT CGTGCCTTTGT GT AAAT CAGAAGAAAGAT GA 944 

316 uGluGlnAlaLeuLeuSerLeuGlnAlaGlyPheGlyGlyGlyAlaGlyS 333 

I I I I I I I I I I I I I I I I II II II I I I I I I I I I I I I I I I II I I I I I I I I I II 
945 GGAACAGGCCCTCCTCTCTCTCCAGGCAGGCTTTGGTGGAGGGGCTGGAT 994 

333 erProAlaAlaProSerLeuAlaGlyHisProValLeuGluProGlnAsn 345 

I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I ::: I M II M I II I I 
995 CTCCTGCCGCACCTTCCCTGGCAGG . CACCCTGTCGTTGAGCCCCAGAAC 1043 

350 CysArgArgProAlaGluLysGlySerMetMetAlaProArgCys.AlaA 366 

I | I I I I I I I I I I I I I I I I I I I I I II I I M M I M I I I I I I II I II Ml 
104 4 TGCAGGCGGCCGGCAGAGAAGGGGTCCATGATGGCGCCTCGGTGCGGCC. 1092 

366 laLeuAspLeuProProTrpThrTrpGluProProGlySerSerHisSer 382 

I I I I I I I I I I I II I I II I I I I I I I 
1093 . . TTGGACCTGCCCCCATGGACCTGG 1116 

383 GlyLysGluGlySerGlyHisGlyGlyArgProGlyProIleProValPr 399 

1116 II 16 

399 oTrpProPhePheLeuLeuProValCysHisCysProGlyAlaPheAlaP 416 




416 roAlaPheProLeuSerArgGlnGlyPheSerSerLeuAlaArgLeuVal 432 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M 
1117 AGACAGGGTTTCTCCTCATTGGCCAGGCTGGTC 1149 

433 SerAsnSer 435 
MINIMI 
1150 TCGAACTCC 1158 

seq_name: gb_vi:U97553 



RESULT 1 

AI126846 

LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
COMMENT 




AI126846 464 bp mRNA EST 26-OCT-1998 

qb8 6d03.xl Soares_f etal_heart_NbHH19W Homo sapiens cDNA clone 
IMAGE: 1706981 3 1 similar to contains Alu repetitive element 
/contains Ll.tl LI LI repetitive element ;, mRNA sequence. 
AI126846 

AI126846.1 GI:3595360 

EST. 

human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; 



FEATURES 

source 



Craniata; Vertebrata; Euteleostomi; 



BASE COUNT 
ORIGIN 



Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 464) 

NCI-CGAP http : / /www . ncbi . nlm. nih . gov/ncicgap . 

National Cancer Institute, Cancer Genome Anatomy Project (CGAP) , 

Tumor Gene Index 

Unpublished (1997) 

Contact: Robert Strausberg, Ph.D. 

Email : cgapbs-r@mail . nih . gov 

This clone is available royalty-free through LLNL ; contact the 
IMAGE Consortium (info@image.llnl.gov) for further information. 
Insert Length: 437 Std Error: 0.00 
Seq primer: -40ml3 fwd. ET from Amersham 
High quality sequence stop: 421. 

Location/Qualifiers 

1. .464 

/organism="Homo sapiens" 
/db_xref="taxon: 9606" 
/clone= n IMAGE: 1706981" 

/clone_lib="Soares_fetal_heart_NbHH19W" 
/ s ex= "unknown " 
/dev_stage="19 weeks" 

/lab_host="DH10B (ampicillin resistant)" 
/note="0rgan: heart; Vector: pT7T3D (Pharmacia) with a 
modified polylinker; Site_l: Not I; Site_2: Eco RI ; 1st 
strand cDNA was primed with a Not I - oligo(dT) primer [5* 
TGTTACCAATCTGAAGTGGGAGCGGCCGCATCTTTTTTTTTTTTTTTTTT 3 ' ] , 
double-stranded cDNA was size selected, ligated to Eco RI 
adapters (Pharmacia), digested with Not I and cloned into 
the Not I and Eco RI sites of a modified pT7T3 vector 
(Pharmacia) . Library went through one round of 
normalization to a Cot = 5. Library constructed by 
M.Fatima Bonaldo . This library was constructed from the 
same fetus as the fetal lung library, Scares fetal lung 
NbHL19W. " 
113 a 116 c 95 g 140 t 



Query Match 100.0%; Score 39; DB 10; Length 464; 

Best Local Similarity 100.0%; Pred. No. 3.2e-05; 

Matches 39; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 tgtgcactcttgggcatacgcctaggagtggaactgctg 39 

I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I 
Db 271 TGTGCACTCTTGGGCATACGCCTAGGAGTGGAACTGCTG 309 



RESULT 2 OCT A V 

US-09-058-48 9-38/c \>> 
Sequence 38, Application US/09058489 
Patent No. 6103886 
GENERAL INFORMATION: 
APPLICANT: Whitehead Institute for Biomedical Research 
APPLICANT: Lahn, Bruce 
APPLICANT: Page, David 

TITLE OF INVENTION: Genes in the No. 6103886-Recombining Region of 
TITLE OF INVENTION: the Y Chromosome 
FILE REFERENCE: WHI97-08pA 

CURRENT APPLICATION NUMBER: US/09/058,489 
CURRENT FILING DATE: 1998-04-10 
EARLIER APPLICATION NUMBER: 60/041,877 
EARLIER FILING DATE: 1997-04-11 
NUMBER OF SEQ ID NOS : 91 

SOFTWARE: FastSEQ for Windows Version 3.0 
SEQ ID NO 38 
LENGTH: 1964 
TYPE: DNA 
ORGANISM: Human 
US-09-058-489-38 



Query Match 40.0%; Score 16; DB 3; Length 1964; 

Best Local Similarity 100.0%; Pred. No. 11; 

Matches 16; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 14 accaccgtgtcctgct 29 

I I I I I I I I I I I I I I I I 
Db 852 ACCACCGTGTCCTGCT 837 



G26698 462 bp DNA linear STS 02-JUN-1996 

human STS TH, sequence tagged site. 
G26698 

G26698.1 GI:1348930 

STS; STS sequence; primer; sequence tagged site, 
human STSs derived from sequences in dbEST and the Jnigene 
collection. 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 462) 
Hudson, T. 

Whitehead Institute/MIT Center for Genome Research; Physically 
Mapped STSs 
Unpublished 

2 (bases 1 to 462) 
Hudson, T. 

Whitehead Institute/MIT Center for Genome Research; Physically 
Mapped ESTs 
Unpublished 

Contact: Thomas Hudson 

Whitehead Institute/MIT Center for Genome Research 
Whitehead Institute for Biomedical Research 
9 Cambridge Center, Cambridge MA 02142 USA 
Tel: 617 252 1900 
Fax: 617 252 1902 
Email : thuds on@genome . wi . mi t . edu 

Primer A: GAACAGGCCCTCCTCTCTCT 
Primer B: ATGGACCCCTTCTCTGCC 
STS size: 127 
PCR Profile: 

Presoak: 
Denaturation : 
Annealing: 56 degrees C 
Polymerization: 
PCR Cycles: 35 
Thermal Cycler: 
Protocol : 

Template: 10 ng 
Primer: each 5 pM 
dNTPs: each 4 nM 
Taq Polymerase: 0.025 units/ul 
Total Vol: 20 ul 

Buffer: 

MgCl2: 1 . 5 mM 
KC1: 50 mM 
Tris-HCL: 10 mM 
pH: 9.3 

Derived from dbEST (genbank accession U14383) . 
FEATURES Location/Qualifiers 
source 1. .462 



RESULT 6 
G26698 
LOCUS 

DEFINITION 
ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
COMMENT 



STS 

primer_bind 
primer_bind 

BASE COUNT 

ORIGIN 



/organism="Homo sapiens 
/db_xref="taxon: 9606" 
/map="364_G_6" 
5. .131 
5. .24 

complement (114 . . 131) 
66 a 107 c 104 g 



91 t 



94 others 



Query Match 51.3%; Score 20; DB 11; Length 462; 

Best Local Similarity 100.0%; Pred. No. 0.17; 

Matches 20; Conservative • 0; Mismatches 0; Indels 0; Gaps 

Qy 20 gcctaggagtggaactgctg 39 

I I I I I I I I I I I I I I I I I I M 
Db 439 GCCTAGGAGTGGAACTGCTG 458 



RESULT 1 
AA463832/C 
LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



FEATURES 

source 



BASE COUNT 
ORIGIN 



AA463832 206 bp mRNA linear EST 10-JUN-1997 

zx67e07.rl Soares_total_f etus_Nb2HF8_9w Homo sapiens cDNA clone 
IMAGE: 796548 5 1 similar to contains Alu repetitive element ; contains 
element PTR7 repetitive element ;, mRNA sequence. 
AA463832 

AA4 63832.1 GI: 2188716 

EST. 

human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 206) 

Hillier,L., Allen, M. , Bowles, L., Dubuque, T., Geisel f G., Jost,S., 
Kucaba,T., Lacy,M., Le, N . , Lennon,G., Marra,M. , Martin, J., Moore, B. 
, Schellenberg,K. , Steptoe,M., Tan,F., Theising,B., White, Y., Wylie 
,T., Waterston,R. and Wilson, R. 
WashU-Merck EST Project 1997 
Unpublished (1997) 
Contact: Wilson RK 

Washington University School of Medicine 

4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108 

Tel: 314 286 1800 

Fax: 314 286 1810 

Email: est@watson.wustl.edu 

This clone is available royalty-free through LLNL ; contact the 

IMAGE Consortium (info@image.llnl.gov) for further information. 

Putative full length read 

The vector to vector length is 207 

Seq primer: -28ml3 rev2 ET from Amersham. 

Location/Qualifiers 

1. .206 

/organism="Homo sapiens" 
/db_xref="GDB: 6040837" 
/db_xref-"taxon:9606" 
/clone=" IMAGE: 796548" 

/clone_lib="Soares_total_fetus_Nb2HF8_9w" 
/dev_stage="8-9 weeks" 
/lab_host="DH10B" 

/note="Vector : pT7T3D-Pac (Pharmacia) with a modified 
polylinker; Site_l: Not I; Site_2: Eco RI; 1st strand cDNA 
was prepared from mRNA obtained from pooled 8-9 week 

(total) fetus material with a Not I - olico(dT) primer [5* 
TGTTACCAATCTGAAGTGGGAGCGGCCGCTTAATTTTTTTTTTTTTTTTT 3 1 ] . 
Double-stranded cDNA was ligated to Eco RI adaptors 

(Pharmacia), digested with Not I and cloned into the Not I 
and Eco RI sites of the modified pT7T3 vector. Library 
went through one round of normalization, and was 
constructed by Bento Soares and M. Fatima Bonaldo. " 
59 a 45 c 61 g 41 t 



Query Match 65.0%; Score 26; DB 9; Length 206; 

Best Local Similarity 100.0%; Pred. No. 1.4; 

Matches 26; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 



11 tgcaccaccatgtcetgctaattttt 36 

II I I I I I I I I I I I I I I I I I I I I M M 
7 6 T GCAC CAC CAT GT C CT GCT AAT T T T T 51 



RESULT 4 
HSAC000367 
LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 
JOURNAL 
REFERENCE 
AUTHORS 

TITLE 
JOURNAL 

REMARK 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 

TITLE 
JOURNAL 

REMARK 



COMMENT 



Craniata; Vertebrata; Euteleostomi; 
Catarrhini; Hominidae; Homo. 



Green, E. D. , 



HSAC000367 43349 bp DNA linear PRI 28-AUG-1997 

Human Cosmid gl862x055 from 7q31.3, complete sequence. 

AC000367 

AC000367.1 GI:2347066 

HTG. 

human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Primates; 

1 (bases 1 to 43349) 

Iadonato, S. P. , Yu,J., Wong, G. K. -S . , Magness, C.L 
Green, P. and Olson, M.V. 

Large-scale MCD Mapping and Sequencing of Human Chromosome 7 
Unpublished (1996) 

2 (bases 1 to 43349) 

Iadonato, S. P. , Yu,J., Wong, G. K. -S . , Magness, C. L. , Green, E.D. 
Green, P. and Olson, M.V. 
Direct Submission 

Submitted (21-OCT-1996) Human Genome Center, University of 
Washington, Box 352145, Seattle, WA 98195, USA 
University of Washington Human Genome Center 
Box 352145 Seattle, WA 98195 

Contact: Shawn Iadonato (iadonato@u . Washington . edu) 
Large-scale MCD Mapping and Sequencing of Human Chromosome 7 

3 (bases 1 to 43349) 
Magness , C . L. 
Direct Submission 
Submitted ( 12-MAR-1997 ) 
Washington, Box 352145, 

4 (bases 1 to 43349) 
Iadonato, S. P. , Yu,J., Wong,G.K.-S 
Green, P. and Olson, M.V. 
Direct Submission 

Submitted (28-AUG-1997 ) Human Genome Center, University of 
Washington, Box 352145, Seattle, WA 98195, USA 
University of Washington Human Genome Center 
Box 352145 Seattle, WA 98195 

Contact: Shawn Iadonato (iadonato@u . Washington. edu) 
On Aug 28, 1997 this sequence version replaced gi: 1881556. 
Overlapping Sequences: 

5': UWGC:gl862x083 (Genbank Accession: AC002113) 
3':- UWGC:gl8 62d218 (Genbank Accession: AC000373) 



Human Genome Center, University of 
Seattle, WA 98195, USA 



Magness, C. L . , Green, E.D. 



Sequence Quality Assessment: 
This entry has been annotated with sequence quality 
estimates computed by the Phrap assembly program. 
All manually edited bases have been reduced to qua.lity zero. 
Quality levels above 40 are expected to have less than 
1 error in 10,000 bp. 

Pet of Consensus above quality 40: 98.4% 
Number of manually edited bases: 6 



Double stranded (DS) coverage: 88.0% 
DS or two chemistry coverage: 99.2% 



Single stranded regions: 2 
Quality Cumulative Percentage of Consensus 



90 xx (11.5%) 

80+ xxxxxxxxxx (50.4%) 

70+ xxxxxxxxxxxxxxxx (82.3%) 

60+ xxxxxxxxxxxxxxxxxx (91.4%) 

50+ xxxxxxxxxxxxxxxxxxx (95.8%) 

4 0+ xxxxxxxxxxxxxxxxxxxx (98.4%) 

30+ xxxxxxxxxxxxxxxxxxxx (99.8%) 

20+ xxxxxxxxxxxxxxxxxxxx (99.9%) 

10+ xxxxxxxxxxxxxxxxxxxx (100.0%) 

00+ xxxxxxxxxxxxxxxxxxxx (100.0%) 

Base-by-base quality values are not generally visible from the 
Genbank flat file format but are available as part 
of this entry 1 s ASN.l file. 



Sequence Validation: 
This sequence has been validated by Multiple Complete Digest 
Mapping. Comparison of the experimentally derived map digest 
fragments with sequence-predicted fragments is given below. 
Small fragments below a variable cutoff (approximately 400-600bp) 
are not mapped and hence do not appear in the table. There are no 
significant remaining descrepancies between the experimental and 
predicted values. Uniquely ordered fragment groups are separated 
by dashed lines . 



Map 


EcoRI 

Seq 




Map 


Hindi I I 
Seq 




Map 


Nsil 

Seq 




5124. 


86 


5144. 


00 


4213. 


20 


4271. 


00 


2634. 


83 


2642. 


00 


1562. 


97 


1593. 


00 


7922. 


52 


7935. 


,00 


1789. 


10 


1797. 


00 


5177. 


.32 


5193. 


00 


490. 


78 


485, 


,00 


699. 


,88 


702. 


00 


3960. 


,50 


3986. 


00 


3168. 


35 


3198, 


,00 


5150. 


,11 


5177. 


00 


9380, 


,91 


9346. 


00 


2464. 


68 


2438, 


,00 


1216. 


,04 


1206. 


00 


5538. 


. 62 


5591. 


00 


6305. 


33 


6300. 


.00 


2719, 


,74 


2726. 


00 


2565, 


.93 


2552. 


00 


2757. 


13 


2756, 


.00 


1428. 


.27 


1446. 


00 


4417, 


.36 


4392. 


00 


6579. 


15 


6623, 


.00 


5504, 


.09 


5500. 


00 


1651, 


.21 


1636. 


00 


5153. 


00 


5085, 


.00 


3214, 


.21 


3189. 


00 


















1088, 


.29 


1089. 


00 


















4175 


.13 


4192. 


00 


















2292 


.35 


2307. 


00 


















820 


.75 


817. 


00 



FEATURES 

source 



repeat_ 

repeat_ 

repeat_ 

repeat 

repeat_ 

repeat_ 

repeat^ 

repeat 

repeat 

repeat_ 

BASE COUNT 
ORIGIN 



1471. 


,36 


1489. 


00 


1022. 


,53 


1021. 


00 


1338. 


,53 


1334. 


00 


1602, 


. 67 


1586. 


00 


2084, 


.45 


2L04. 


00 



region 
region 
region 
region 
region 
region 
region 
region 
region 
region 
13710 



Location/ Qualif iers 
1. .43349 

/organism="Homo sapiens" 

/db_xref="taxon: 9606" 

/chromosome="7 " 

/map="7q31.3" 

/ clone= "NHGRI : yWS S 1 8 62 " 

/sub_clone="UWGC:gl862x055" 

/cell_line="GM10791" 

/clone_lib="E. Green Chromosome 7 YAC Resource" 
complement (688 . .2990) 
/rpt_family= n Tiggerl" 
complement (94 60 . . 9602) 
/rpt_family="MER3" 
complement (10588. . 10883) 
/rpt_family="ALU" 
10949. .11278 
/rpt_family= M ALU n 
complement (22058 . . 22282 ) 
/rpt_family="MER3" 
complement (24214. .25023) 
/rpt_family="Ll" 
25021. .25070 
/rpt_family="Ll" 
complement (30836. . 31068) 
/rpt_family= n Ll" 
complement (32584. .32877) 
/rpt_family="ALU" 
complement (38416. .38541) 
/rpt_family="ALU" 
a 8483 c 7675 g 13481 t 



Query Match 50.0%; Score 20; DB 9; Length 43349; 

Best Local Similarity 100.0%; Pred. No. 0.89; 

Matches 20; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 3 tacaggtttgcaccaccgtg 22 

I I I I I II I I I II I I I I I I I I 
Db 328 51 TACAGGTTTGCACCACCGTG 32 87 0 



RESULT 7 

HSU14383 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
MEDLINE 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 
FEATURES 

source 



HSU14383 1403 bp mRNA 

Human mucin (MUC8) mRNA, partial cds . 
U14383 U04799 
U14383.1 GI:606953 



linear PRI 31-DEC-1994 



repeat__region 

gene 
CDS 



3 'UTR 

polyA_signal 



human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 907 to 1403) 
Sachdev, G. P. 

Carboxy terminus of a major human tracheobronchial mucin MUC8 
Unpublished 

2 (bases 1 to 941) 

Shankar,V., Gilmore, M. S . , Elkins,R.C. and Sachdev, G. P. 

A novel human airway mucin cDNA encodes a protein with unique 

tandem-repeat organization 

Biochem. J. 300 (Pt 2), 295-298 (1994) 

94271137 

3 (bases 1 to 1403) 
Sachdev, G. P. 
Direct Submission 

Submitted ( 03-SEP-1994 ) Goverdhan P. Sachdev, University of 
Oklahoma Health Sciences Center, Medicinal Chemistry, College of 
Pharmacy, 1110 N. Stonewall Avenue, Oklahoma City, OK 73117, USA 
On Jan 1, 1995 this sequence version replaced gi: 537430. 

Location/Qualifiers 

1. .1403 

/organism-"Homo sapiens" 
/db_xref="taxon: 9606" 
/ chromos ome= " 12 " 
/clone="PAM2" 

/tissue type=" trachea /bronchus " 
/dev_stage="adult" 
<1. .892 

/ rp t_t ype=t andem 
/rpt_unit=3. .43 
1. .1403 
/gene="MUC8" 
<1. .944 
/gene="MUC8 M 
/codon_start=3 
/product= "mucin" 
/protein_id="AAA5834 6. 1" 
/db_xref="GI: 501033" 

/ trans lation="TSCPRPLQEGTPGSRAAHALSRRGHRVHELPTSSPGGDTGFMSC 
PRPFQEGTPGSRAAHVLSRKGPRVHELPTSSPGRDPGFTSCPRPLQEGTRVTNCPRPL 
QEGTPGSRAAHVLSRRGHRVHELPTPSPGRDPGFMSCPRPLQEGTRVTNCPRPLQEGT 
RVTSCPRRLQEGTRVTSCPRPLQEGTRVTNCPRALQEGTPGSRAAHALSRKGPRVHEL 
PTSSPGGDTGFTSCPRPLQEGTPGSRAAHALSRRGHRVHELPTSSPGRDPGHELPTSS 
PGGDTGFTSCPRTFQEGTPGSGLLPAHIVPLCKSEER" 
945. .1403 
/gene= H MUC8" 
1350. .1355 
/gene="MUC8" 



polyA_site 1403 

/gene="MUC8" 
/note="17 A residues' 1 

BASE COUNT 254 a 490 c 413 g 246 

ORIGIN 



Query Match 51.3%; Score 20; DB 9; Length 1403; 

Best Local Similarity 100.0%; Pred. No. 0.15; 

Matches 20; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 20 gcctaggagtggaactgctg 39 

I I I I II M I I I I I I I ! I I I I 
Db 1380 GCCTAGGAGTGGAACTGCTG 1399 



RESULT 5 

SCE63 

LOCUS 

DEFINITION 
ACCESSION 
VERSION 
KEYWORDS 



SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
MEDLINE 

REFERENCE 
AUTHORS 
JOURNAL 

REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



linear BCT 17-MAR-1999 



cdaPS3; 



SCE63 37200 bp DNA 

Streptomyces coelicolor cosmid E63. 
AL035640 

AL035640.2 GI : 4500374 

4-hydroxyphenylpyruvic acid dioxygenase; aminotransferase; 
AMP-binding domain; CDA peptide synthetase; cdaPSl; cdaPS2; 
DUF4 , Domain of unknown function; glycolate oxidase; 
Phosphopantetheine attachment site. 
Streptomyces coelicolor A3 (2). 
Streptomyces coelicolor A3 (2) 

Bacteria; Firmicutes; Actinobacteria; Actinobacteridae; 
Actinomycetales; Streptomycineae ; Streptomycetaceae; Streptomyces . 
1 (bases 1 to 37200) 

Redenbach,M. , Kieser,H.M., Denapaite, D . , Eichner, A. , Cullum,J., 
Kinashi,H. and Hopwood,D.A. 

A set of ordered cosmids and a detailed genetic and physical map 
for the 8 Mb Streptomyces coelicolor A3 (2) chromosome 
Mol. Microbiol. 21 (1), 77-96 (1996) 
97000351 



2 (bases 1 
Saunder,D.C 
Unpublished 

3 (bases 1 
Bentley, S . D 



to 37200) 
and Harris, D. 

to 37200) 

Parkhill,J., 



Barrell , B . G . and Ra j andream,M. A. 
Direct Submission 

Submitted ( 17-MAR-1999 ) Streptomyces coelicolor sequencing project, 
Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge 
CB10 ISA E-mail: barrell@sanger.ac.uk Cosmids supplied by Prof. 
David A. Hopwood, [3] John Innes Centre, Norwich Research Park, 
Colney, Norwich, Norfolk NR4 7UH, UK 

On Mar 24, 1999 this sequence version replaced gi: 4481931. 
Notes : 

Streptomyces coelicolor sequencing at The Sanger Centre is funded 
by the BBSRC. 

Details of S. coelicolor sequencing at the Sanger Centre are 
available on the World Wide Web. 

(URL; http://www.sanger.ac.uk/Projects/S_coelicolor/) CDS are 
numbered using the following system eg SC7B7.01c. SC (S. 
coelicolor), 7B7 (cosmid name), .01 (first CDS), c (complementary 
strand) . 

The more significant matches with motifs in the PROSITE database 
are also included but some of these may be fortuitous. The length 
in codons is given for each CDS. 

Usually the highest scoring match found by fasta -o is given for 
CDS which show significant similarity to other CDS in the database. 
The position of possible ribosome binding site sequences are given 
where these have been used to deduce the initiation codon. Gene 
prediction is based on positional base preference in codons using a 
specially developed Hidden Markov Model (Krogh et al . , Nucleic 
Acids Research, 22 (22 ): 47 68-4778 ( 1994 ) ) and the FramePlot program 
of Bibb et al., Gene 30:157-66(1984) as implemented at 
http : //www.nih. go. jp/ 

jun/cgi-bin/ frameplot.pl. CAUTION: We may not have predicted the 
correct initiation codon. Where possible we choose an initiation 
codon (atg, gtg, ttg or (att) ) which is preceded by an upstream 



ribosome binding site sequence (optimally 5-13bp before the 
initiation codon) . If this cannot be identified we choose the most 
upstream initiation codon. 

IMPORTANT: This sequence MAY NOT be the entire insert of the 
sequenced clone. It may be shorter because we only sequence 
overlapping sections once, or longer, because we arrange for a 
small overlap between neighbouring submissions. Cosmid E63 lies 
between E8 and E29 on the Asel-E genomic restriction fragment. 
FEATURES Location/Qualifiers 
source 1. .37200 

/organism= f, Streptomyces coelicolor A3 (2)" 
/strain="A3<2) " 
/db_xref ="taxon : 10022 6 " 
/clone= M cosmid E63" 
gene complement ( 1 . .138) 

/gene="SCE63.06 n 
CDS complement (<1 . .138) 

/gene= u SCE63.06" 

/note="SCE63 . 06, partial CDS, possible aminotransferase, 
len: >45aa; similar to the N-terminal region of TR:052815 
(EMBL:AJ223998) a protein similar to aminotransferase from 
a cluster of genes nvolved in the biosynthesis of a 
vancomycin group antibiotic in Amycolatopsis orientalis 
(Actinomycete) (438 aa) fasta scores; opt: 87, z-score: 
147.0, E(): 0.69, (45.2% identity in 31 aa overlap). The 
remainder of this CDS lies on the neighbouring 
Streptomyces coelicolor cosmid E8 . " 
/ codon_start=l 
/transl_table=ll 
/label=SCE63. 06 

/product= "putative aminotransferase" 
/protein_id="CAB38521. 1" 
/db_xref="GI : 4481937" 

/trans la tion =,, MTTTTPARTDRDGVLARTALHPSLSAPIiLDTMNFLNEVTLRypQ 
AI" 

gene complement ( 135 . .12 68) 

/gene="SCE63. 05" 
CDS complement (135. .1268) 

/gene="SCE63.05" 

/note="SCE63. 05, probable glycolate oxidase, len: 377aa; 
similar to many both prokaryotic and eukaryotic egs . 
TR: 052792 (EMBL: AJ223998 ) similar to glycolate oxidase 
from a cluster of genes nvolved in the biosynthesis of a 
vancomycin group antibiotic in Amycolatopsis orientalis 

(Actinomycete) (357 aa) fasta scores; opt: 995, z-score: 
998.4, E{): 0, (47.5% identity in 343 aa overlap) and 
SW:GOX_SPIOL glycolate oxidase from Spinacia oleracea 

(spinach) (369 aa) fasta scores; opt: 917, z-score: 920.7, 
E(): 0, (41.9% identity in 360 aa overlap). Contains Pfam 
match to entry PF01070 FMN_dh, FMN-dependent de 
hydrogenase, score 412.60, E-value 3.6e-120 and PS00557 
FMN-dependent alpha-hydroxy acid dehydrogenases active 
site. " 

/codon_start=l 
/transl_table=ll 
/label=SCE63. 05 

/product="putative glycolate oxidase" 




/protein_id="CAB38520. 1" 
/db__xref="GI : 4481936" 

/translation="MREPLTLDDFARLARGQLPAATWDFIAGGAGRERTLAANEAVFG 
AVRLRPRALPGIEEPDTSVEVLGSRWPAPVGIAPVAYHGLAHPDGEPATAAAAGALGL 
PLWSTFAGRSLEEVARAASAPLWLQLYCFRDHETTLGLARRi^RDSGYQALVLTVDTP 
FT G RRL RD L RN G FAVP AH I T P AN LT GT AAAG SAT P GAH S RLAFD RRL DW S FVARL GAA 
SGLPVLAKGVLTAPDAEAAVAAGVAGIWSNHGGRQLDGAPATLEALPEWSAVRGRC 
P VL L D G GVRTGAD VLAALALGARAVLVGRP AL YALAVGGAS GVRRMLT L LT E D FADTM 
VLTGHAATGTIGPDTLAPPHHAPPHHGPPTAPRPAPHRDRSHG 11 
misc_feature complement (213. . 1229) 
/gene="SCE63.05" 

/note="Pfam match to entry PF01070 FMN_dh, FMN-dependent 
de hydrogenase, score 412.60, E-value 3.6e-120" 
misc feature complement ( 507 . .527) 
/gene="SCE63.05 n 

/note="PS00557 FMN-dependent alpha-hydroxy acid 

dehydrogena ses active site." 

/label=* 

gene complement ( 14 18 . .2533) 

/gene="SCE63.04" 
CDS complement (1418 . .2533) 

/gene="SCE63.04" 

/note="SCE63 . 04, probable 4-hydroxyphenylpyruvic acid 
dioxygenase, len: 371aa; similar to many both prokaryotic 
and eukaryotic egs . TR:052791 (EMBL : AJ223998 ) similar to 
hydroxyphenyl pyruvate dioxygenase from a cluster of genes 
involved in the biosynthesis of a vancomycin group 
antibiotic in Amycolatopsis orientalis (Actinomycete ) (357 
aa) fasta scores; opt: 989, z-score: 1130.3, E(): 0, 
(48.7% identity in 355 aa overlap) and SW: HPPD_MOUSE hpd, 
4-hydroxyphenylpyruvic acid dioxygenase from Mus musculus 
(mouse) (392 aa) fasta scores; opt: 610, z-score: 698.5, 
E(): 1.3e-31, (31.6% identity in 361 aa overlap)." 
/ codon_start=l 
/transl_table=ll 
/label-SCE63.04 

/product="putative 4-hydroxyphenylpyruvic acid 
dioxygenase" 

/protein_id="CAB38519. 1" 
/db_xref="GI : 4481935" 

/trans la tion="MLPPFPFLHWRAAMPPSDIAYAELYVAE'DREASGFLVDSLGFVP 
LAVAGPATGTHDRRSTVLRSGEVTLWTQALAPDTPVARYVE.RHGDSIADLAFGCDDV 
RSCFDRAVLAGAEALQAPTPSHRAGQDAWFATVSGFGDIRHl'LVPAADGDGAGLLPPD 
RDWALLPAATGRTGPRPLLDHVAVCLESGTLRSTAEFYEAAFDMPYYSSEYIEVGEQA 
MDSIWRNAGGGITFTLIEPDDTRVPGQIDQFLSAHDGPGVQHLAFLVDDIVGSVRSL 
GDRGVAFLRTPGAYYDLLTERVGAMADAIEDLRETNVLADRDEWGYLLQIFTRSPYPR 
GTLFYEYIQRNGARGFGSSNIKALYEAVEREREVAGR" 

gene 2802. .25193 

/gene="SCE63. 03c" 
/note="cdaPSI" 

CDS 2802. .25193 

/gene="SCE63. 03c" 

/note="SCE63. 03c, cdaPSI, CDA peptide synthetase I, len: 
7463aa; part of the calcium-dependent antibiotic (CDA) 
biosynthetic cluster from Streptomyces co€:licolor. CDA is 
a peptide antibiotic which is synthesised non-ribosomally 
by a putative multifunctional peptide synthetase enzyme. 



This CDS encodes a subunit of this enzyme and is suspected 
to be responsible for the incorperation of the first 6 
amino acids into the antibiotic structure. This ORF 
overlaps the downstream (cdaPSII) ORF by one base 
indicating possible translational coupling. Contains eight 
separate Pfam matches to entry PF00668 DUF4, Domain of 
unknown function (U) , six separate Pfam matches to entry 
PF00501 AMP-binding, AMP-binding enzyme (A) and six 
separate Pfam matches to entry PF00550 pp-binding, 
Phosphopantetheine attachment site (P) . These Pfam matches 
cover the full length of the protein in the following 
order from N to C-terminal 

U-A-P-U-A-P-U-A-P-U-U-A-P-U-A-P-U-A-P-U . " 

/codon_start=l 

/transl_table=ll 

/label=cdaPSI 

/product="CDA peptide synthetase I" 
/protein_id="CAB38518 . 1" 
/db_xref="GI: 4481934" 

/ translation="MSENSSVRHGLTSAQHEVWLAQQLDPRGAHYRTGSCLEIDGPLD 
HAVLSRALRLTVAGTETLCSRFLTDEEGRPYRAYCPPAPEGSAAVEDPDGVPYTPVLL 
RHIDLSGHEDPEGEAQRWMDRDRATPLPLDRPGLSSHALFTLGGGRHLYYLGVHHIVI 
DGTSMALFYERIAEWRALRDGRAVPAAAFGDTDRMVAGEEAYRASARYERDRAYWTG 
LFTDRPEPVSLTGRGGGRALAPTVRSLGLPPERTEVLGRAAEATGAHWARWIAGVAA 
FLH RTT GARDVWS VP VT GRYGANARI T P GMVSN RL P LRLAVRP GE S FARWET VS EA 
MSGLLAHSRFRGEDLDRELGGAGVSGPTVNVMPYIRPVDFGG-PVGLMRSISSGPTTDL 
NIVLTGTPESGLRVDFEGNPQVYGGQDLTVLQERFVRFLAE1AADPAATVDEVALLTP 
DERERVLDGWNDTAHEVPETTLPELFAARAARTPGHEALVYEIGTSLTYAELDARAERL 
AGALT ARGAG P E RFVAVAVE RS AE LWALLAVL K S GAAYVP VD P G Y P AD RI AH I L RDA 
GAMLVLTTRDTAERLPGDGTPRLLLDEPAAAGTTAAGAPAPFGTLPRALPAPGHPAYV 
IYTSGSTGRPKGWISHRAIWRLAWMQDTYGLEPSDRVLQKTPSGFDVSVWEFFWPL 



Query Match 43.6%; Score 17; DB 1; Length 37200; 

Best Local Similarity 100.0%; Pred. No. 25; 

Matches 17; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 



Qy 6 ctgcgccacctcaaccc 22 

I I I I I I I I I I I I I I I I I 
Db 13854 CTGCGCCACCTCAACCC 13870 



RESULT 4 
US-09-103-330-35 

; Sequence 35, Application US/09103330A 
; Patent No. 6319716 




; GENERAL INFORMATION: \ 

; APPLICANT: TIKOO, SURESH K. 

; APPLICANT: BAB I UK, LORNE A. 

; APPLICANT: REDDY, POLICE S. 

; TITLE OF INVENTION: ISOLATION OF MUTANTS IN THE E3 REGION OF THE 

; TITLE OF INVENTION: BOVINE ADENOVIRUS GENOME AND THEIR USE IN VACCINES 

; FILE REFERENCE: 293102002121 

; CURRENT APPLICATION NUMBER: US/ 09/ 103 , 330A 

; CURRENT FILING DATE: 1998-06-23 

; EARLIER APPLICATION NUMBER: 08/880,234 

; EARLIER FILING DATE: 1997-06-23 

; EARLIER APPLICATION NUMBER: 08/164,292 

; EARLIER FILING DATE: 1993-12-09 

; NUMBER OF SEQ ID NOS : 40 

; SOFTWARE: Patent In Ver. 2.0 

; SEQ ID NO 35 

LENGTH: 34446 

TYPE: DNA 

; ORGANISM: Bovine adenovirus type 3 
US-09-103-330-35 



Query Match 38.5%; Score 15; DB 4; Length 34446; 

Best Local Similarity 100.0%; Pred. No. 2.7; 

Matches 15; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 7 ctcttgggcatacgc 21 

I I I I I I I I I I I I I I I 
Db 13657 ctcttgggcatacgc 13671 



RESULT 1 

G26698 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
COMMENT 



linear STS 02-JUN-1996 



G26698 462 bp DNA 

human STS TH, sequence tagged site. 
G26698 

G26698.1 GI:1348930 

STS; STS sequence; primer; sequence tagged site. 

human STSs derived from sequences in dbEST and the Unigene 

collection. 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 462) 
Hudson, T. 

Whitehead Institute/MIT Center for Genome Research; Physically 

Mapped STSs 

Unpublished 

2 (bases 1 to 462) 
Hudson, T. 

Whitehead Institute/MIT Center for Genome Research; Physically 

Mapped ESTs 

Unpublished 

Contact: Thomas Hudson 

Whitehead Institute/MIT Center for Genome Research 

Whitehead Institute for Biomedical Research 

9 Cambridge Center, Cambridge MA 02142 USA 

Tel: 617 252 1900 

Fax: 617 252 1902 

Email : thudson@genome . wi . mit . edu 



Primer A: GAACAGGCCCTCCTCTCTCT 
Primer B: ATGGACCCCTTCTCTGCC 
STS size: 127 
PCR Profile: 

Presoak: 
Denaturation : 
Annealing: 56 degrees C 
Polymerization : 
PCR Cycles: 35 
Thermal Cycler: 
Protocol : 

Template: 10 ng 

Primer: each 5 pM 

dNTPs: each 4 nM 

Taq Polymerase: 0.025 units/ul 

Total Vol: 20 ul 



Buffer: 

MgC12: 1 . 5 mM 
KC1: 50 mM 
Tris-HCL: 10 mM 
pH: 9.3 



Derived from dbEST (genbank accession U14383) . 
FEATURES Location/Quali f iers 

source 1. .462 




/organism="Homo sapiens" 
/db_xref="taxon: 9606" 
/map="364_G_6" 
STS 5. .131 

primer_bind 5 . .24 

primer_bind complement ( 114 . .131) 

BASE COUNT 66 a 107 c 104 g 91 t 94 others 

ORIGIN 



Query Match 69.2%; Score 27; DB 11; Length 462; 

Best Local Similarity 100.0%; Pred. No. 2.2e-05; 

Matches 27; Conservative 0; Mismatches 0; Indels 0; 



Gaps 



0; 



Qy 13 ggcatatgcctaggagtggaactgctg 39 

I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 4 32 GGCATATGCCTAGGAGTGGAACTGCTG 458 



RESULT 2 

HSU14383 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
MEDLINE 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 
FEATURES 

source 



linear PRI 31-DEC-1994 



HSU14383 1403 bp mRNA 

Human mucin (MUC8) mRNA, partial cds . 
U14383 U04799 
U14383.1 GI:606953 



human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 907 to 1403) 
Sachdev,G. P. 

Carboxy terminus of a major human tracheobronchial mucin MUC8 
Unpublished 

2 (bases 1 to 941) 

Shankar,V., Gilmore, M. S . , Elkins,R.C. and Sachdev, G. P . 

A novel human airway mucin cDNA encodes a protein with unique 

tandem-repeat organization 

Biochem. J. 300 (Pt 2), 295-298 (1994) 

94271137 

3 (bases 1 to 1403) 
Sachdev, G. P. 
Direct Submission 

Submitted ( 03-SEP-1994 ) Goverdhan P. Sachdev, University of 
Oklahoma Health Sciences Center, Medicinal Chemistry, College of 
Pharmacy, 1110 N. Stonewall Avenue, Oklahoma City, OK 73117, USA 
On Jan 1, 1995 this sequence version replaced gi: 537430. 

Location/Qualif iers 

1. .1403 

/organism="Homo sapiens" 
/db_xref="taxon: 9606" 
/chromosome=" 12 " 
/clone="PAM2" 

/ tissue_type="trachea/bronchus " 
/dev_stage^"adult" 
<1. .892 

/ rpt type-tandem 



repeat_region 



gene 



CDS 



3*UTR 

polyA_signal 
polyA site 



BASE COUNT 
ORIGIN 



/rpt_unit=3. .43 
1, .1403 
/gene="MUC8" 
<1. .944 
/gene="MUC8 u 
/codon_start=3 
/product= "mucin" 
/protein_id="AAA58346. 1" 
/db_xref="GI: 501033" 

/translation="TSCPRPLQEGTPGSRAAHALSRRGHRVHELPTSSPGGDTGFMSC 
PRPFQEGTPGSRAAHVLSRKGPRVHELPTSSPGRDPGFTSCPRPLQEGTRVTNCPRPL 
QEGTPGSRAAHVLSRRGHRVHELPTPSPGRDPGFMSCPRPLQEGTRVTNCPRPLQEGT 
RVTSCPRRLQEGTRVTSCPRPLQEGTRVTNCPRALQEGTPGSRAAHALSRKGPRVHEL 
PTSSPGGDTGFTSCPRPLQEGTPGSRAAHALSRRGHRVHELPTSSPGRDPGHELPTSS 
PGGDTGFTSCPRT FQEGTPGSGLLPAHIVPLCKSEER" 
945. .1403 
/gene= n MUC8" 
1350. .1355 
/gene="MUC8" 
1403 
/gene="MUC8" 
/note="17 A residues" 
254 a 490 c 413 g 246 t 



Query Match 69.2%; Score 27; DB 9; Length 1403; 

Best Local Similarity 100.0%; Pred. No. 1.9e-05; 

Matches 27; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 13 ggcatatgcctaggagtggaactgctg 39 

I I I I I I I I I I I II I I II I I I I I I I I I I 
Db 1373 GGCATATGCCTAGGAGTGGAACTGCTG 1399 



V 



RESULT 11 






AAV37951 






ID 


AAV37951 standard; DNA; 2962 BP. 


XX 








AC 


AAV37951; 




XX 








DT 


10-SEP- 


■1998 


(first entry) 


XX 








DE 


Human erythropoietin clone 7.2 encoding DNA. 


XX 








KW 


Human ; 


erythropoietin; EPO; Chinese hamster ova, 


KW 


medicine; biological research; ss. 


XX 








OS 


Homo sapiens. 




XX 








FH 






Lo cation/ Qua li f iers 


FT 


CDS 




625. .2772 


FT 






/*tag= a 


FT 






/product= "erythropoietin" 


FT 






/note= "contains introns" 


FT 


exon 




625 . . 640 


FT 






/*tag= b 


FT 






/number = 1 


FT 


J.11 I- L. UIl 




641 . .1201 


FT 






/^tag^ c 


FT 






/ number = 1 


FT 


exon 




1202 . . 1347 


FT 






/*tag= d 


FT 






/ numb er= 2 


FT 


intron 




1348 . . 1605 


FT 






/ * t,&Q = e 


FT 






/number— 2 


FT 


exon 




1606 . . 1692 


FT 






/*taa= f 


FT 






/ m imh p k= "3 

/ 11 UllLkJ C J- *J 


FT 


intron 




jl \j y • . <J V M 


FT 






/ lay y 


FT 






/ y\ 1 i ty*i V\ ^ *- — Q 
/ 11U.ILLUCL. — o 


FT 


exon 




cjUj • .^104 


FT 






/*tag= h 


FT 






/number- 4 


FT 


intron 




2485. .2616 


FT 






/*tag= i 


FT 






/number= 4 


FT 
r i 


exon 




2617. .2772 


FT 
r i 






/*tag= j 


FT 






/number^ 5 


VY 
AA 










RU2089611-C1 




XX 








PD 


10-SEP- 


-1997. 




XX 








PF 


13-JUL- 


-1995; 


95RU-0111858. 


XX 








PR 


13-JUL- 


-1995; 


95RU-0111858. 


XX 








PA 


(MEDB= 


) MED 


BIOTECHN RES PRODN CENTRE. 



0 



XX 

PI Kamerova IA, Kolobkov SL, Zelenin MG; 
XX 

DR WPI; 1998-205757/18. 

DR P-PSDB; AAW62048. 
XX 

PT New strain of cultivated cells of Chinese hamster - acts as producer 

PT of human erythropoietin which can be used in medicine and in 

PT biological research 
XX 

PS Disclosure; Column 15-22; 13pp; English. 
XX 

CC The present sequence encodes human erythropoietin clone 7.2 from 

CC the present invention. The present invention describes a new CHO 

CC strain of cultivated cells of Chinese hamster VSKK (P) 637 D, which 

CC produces human erythropoietin. The new strain is used as a new 

CC strain-producer of human erythropoietin, which can be used in medica 

CC therapy and research, and also in biological research. The use of th 

CC strain reduces the cost of production of human erythropoietin owing 

CC to increased productivity of the strain. 
XX 

SQ Sequence 2962 BP; 596 A; 909 C; 881 G; 576 T; 0 other; 



Query Match 38.5%; Score 15; DB 19; Length 2962; 

Best Local Similarity 100.0%; Pred. No. 35; 

Matches 15; Conservative 0; Mismatches 0; Indels 0; Gaps 

Qy 14 cctcaacccaggcgt 28 

I I I I I I I I I I I I I I I 
Db 248 cctcaacccaggcgt 262 



RESULT 12 
AAA71992 

ID AAA71992 standard; DNA; 2962 BP. 
XX 

AC AAA71992; 
XX 

DT 19-JAN-2001 (first entry) 
XX 

DE Human erythropoietin DNA from clone 7.2. 
XX 

KW Erythropoietin; human; antianemic; late erythrocyte precursor cell; 

KW replacement therapy; treatment; ds . 

XX 

OS Homo sapiens. 
XX 

FH Key Location/Qualifiers 
FT CDS 625.. 2772 

FT /*tag= a 

FT /product= "erythropoietin" 

FT sig_peptide 625.. 1269 

FT /*tag= b 

FT mat_peptide 1270.. 2769 

FT /*tag= c 

FT exon 625. . 640 



FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

XX 

PN 

XX 

PD 

XX 

PF 

XX 

PR 

XX 

PA 

XX 

DR 

DR 

XX 

PT 

PT 

PT 

XX 

PS 

XX 

cc 
cc 
cc 
cc 
cc 
cc 
cc 
cc 
cc 
cc 
cc 



intron 



exon 



intron 



exon 



intron 



intron 



/*tag= d 
/ number = 1 
641. .1201 
/*tag= e 
/ number = 1 
1202. .1347 
/*tag= f 
/ number = 2 
1348. . 1605 
/*tag= g 
/ number = 2 
1606. .1693 
/*tag= h 
/number- 3 
1694. .2304 
/*tag= i 
/ number= 3 
2305. .2484 
/*tag= j 
/ number = 4 
2485. .2616 
/*tag= k 
/number= 4 
2617. .2772 
/*tag= 1 
/number= 5 



DE19855489-A1. 



17-AUG-2000. 



01-DEC-1998; 



98DE-1055489. 



01-DEC-1998; 98DE-1055489 . 

(GROZ/) GROZA I. 

WPI; 2000-566040/53. 
P-PSDB; AAB10654. 

New nucleic acid molecule comprising simian virus 40 regulatory 
sequences and antibiotic resistance gene, useful for expressing 
erythropoietin in mammalian cells for treating anemia - 

Claim 1; Fig 5; 18pp; German. 

This invention describes a novel nucleic acid molecule (I) encoding an 
erythropoietin (EPO) polypeptide (II), transcriptional and translational 
regulatory sequences from simian virus 40 (SV40) , including the SV40 
early promoter and a sequence encoding resistance to an antibiotic. The 
product of the invention has antianemic activity. EPO regulates 
proliferation and differentiation of late erythrocyte precursor cells. 
(I) is used for the recombinant production of human EPO in mammalian 
cells. EPO is used, in replacement therapy, to treat anemia. Cells 
transformed with (I) produce EPO at a high level (e.g. 1500-1800 
international units/ml) which is stable under non-selection conditions. 
The plasmid copy number in the cells can be increased without using the 



CC expensive and highly cytostatic agent methotrexate. This sequence 

CC encodes the human erythropoietin protein which is described in the me 

CC of the invention. 

XX 

SQ Sequence 2962 BP; 591 A; 895 C; 872 G; 573 T; 31 other; 



Query Match 38.5%; Score 15; DB 21; Length 2962; 

Best Local Similarity 100.0%; Pred. No. 35; 

Matches 15; Conservative 0; Mismatches 0; Indels 0; Gaps 

Qy 14 cctcaacccaggcgt 28 

I I I I I I I I I I I I I I I 
Db 24 8 cctcaacccaggcgt 2 62 



AAN50347; 

01-JAN-1980 (first entry) 

Positve clone (phage lambda-hEl) isolated from human fetal liver 
gene bank. 

Erythropoietin; red blood cell; erythrocyte; anaemia; blood; 
disorder; ss; phage lambda-hEl; gene bank. 



RESULT 13 
AAN50347 

ID AAN50347 standard; DNA; 3211 BP. 
XX 
AC 
XX 
DT 
XX 
DE 
DE 
XX 
KW 
KW 
XX 
OS 
XX 
FH 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
XX 
PN 
XX 
PD 
XX 
PF 
XX 
PR 
PR 
PR 
PR 
XX 
PA 
XX 



Homo sapiens. 

Key 
intron 

intron 

intron 

intron 

intron 

WO8502610-A. 

20- JUN-1985. 

ll-DEC-1984; 

30-NOV-1984; 
13-DEC-1983; 

21- FEB-1984; 
28-SEP-1984; 



Location/Qualifiers 
621. . 632 
/*tag= a 
1072. .1218 
/*tag= b 
1475. .1561 
/*tag- c 
2174. .2353 
/*tag= d 
2488. .2640 
/*tag= e 



84WO-US02021. 

84US-0675298. 
83US-0561024. 
84US-0582185. 
84US-0655841. 



(KIRI-) KIRIN-AMGEN INC. 



DR WPI; 1985-159229/26. 

DR P-PSDB; AAP50300. 
XX 

PT New polypeptide having properties of erythropoietin - is prepd. 

PT by cultivation of transformed eucaryotic or procaryotic host 
XX 

PS Disclosure; Page 43; 113pp; English. 
XX 

CC Human erythropoietin encoded by a sequence encoded by this phage 

CC lambda-hEl is essential for red blood cell formation and is used 

CC for the diagnosis and treatment of blood disorders such as anaemia. 

CC Large amounts of EPO may be obtained using recombinant DNA 

CC techniques in contrast to small amounts obtained from plasma 

CC and urine. This sequence is expressed in E. coli. Bases indicated 

CC by a letter x were undetermined. See also AAN50345-6, AAN50348-50 and 

CC AAP50298-99, AAP50301. 

XX 

SQ Sequence 3211 BP; 658 A; 978 C; 929 G; 640 T; 6 other; 



Query Match 38.5%; Score 15; DB 6; Length 3211; 

Best Local Similarity 100.0%; Pred. No. 35; 

Matches 15; Conservative 0; Mismatches 0; Indels 0; Gaps 

Qy 14 cctcaacccaggcgt 28 

I I I I I I I I I I I I I I I 
Db 248 cctcaacccaggcgt 262 



RESULT 14 
AAV30956 

ID AAV30956 standard; DNA; 3211 BP. 
XX 

AC AAV30956; 
XX 

DT ll-SEP-1998 (first entry) 
XX 

DE Human erythropoietin encoding genomic DNA. 
XX 

KW Human; erythropoietin; EPO; bone marrow; reticulocyte; red blood cell; 

KW expression; CHO; Chinese hamster ovary cell; diagnosis; blood disorder 

KW ss . 
XX 

OS Homo sapiens. 



XX 

FH Key Location/Qualif iers 

FT CDS 621. .2643 

FT /*tag= a 

FT /product= "erythropoietin" 

FT /note= "contains introns" 

FT exon 621. . 633 

FT /*tag= b 

FT /number= 1 

FT intron 634 . . 1072 

FT /*tag= c 

FT /number= 1 

FT exon 1073.. 1218 



FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 



intron 



exon 



intron 



exon 



intron 



exon 



/*tag= d 
/ number = 2 
1219. .1474 
/*tag= e 
/ number= 2 
1475. .1561 
/*tag= f 
/number= 3 
1562. .2173 
/*tag= g 
/ number= 3 
2174. .2353 
/*tag= h 
/ number= 4 
2353. .2487 
/*tag= i 
/nurnber= 4 
2488. . 6643 
/*tag= j 
/number= 5 



XX 

PN AU688723-B. 
XX 

PD 19-FEB-1998. 
XX 

PF 02-DEC-1997; 97AU-0046867 . 
XX 

PR 02-DEC-1997; 97AU-0046867 . 
XX 

PA (KIRI ) KIRIN AMGEN INC. 
XX 

PI Lin F; 
XX 

DR WPI; 1998-261957/24. 

DR P-PSDB; AAW58400. 
XX 

PT Recombinant human erythropoietin - potentially useful for diagnosis 

PT and treatment of blood disorders 

XX 

PS Example 5; Page 39-43; lOOpp; English. 
XX 

CC The present sequence encodes human erythropoietin (EPO) , from an 

CC example of the present invention. The present invention describes 

CC recombinant human EPO which causes bone marrow cells to increase 

CC production of reticulocytes or red blood cells, where the polypeptide 

CC is the product of expression in CHO (Chinese hamster ovary) cells of 

CC an exogenous DNA sequence encoding human EPO. EPO is potentially 

CC useful in the diagnosis and treatment of blood disorders 

CC characterised by low or defective red blood cell production. 

XX 

SQ Sequence 3211 BP; 658 A; 979 C; 928 G; 640 T; 6 other; 



Query Match 38.5%; Score 15; DB 19; Length 32:.l; 

Best Local Similarity 100.0%; Pred. No. 35; 

Matches 15; Conservative 0; Mismatches 0; Indels 0; Gaps 




Qy 


14 cctcaacccaggcgt 28 
M M 1 1 1 1 II 1 1 1 1 1 




Db 


248 cctcaacccaggcgt 262 




RESULT. 15 




AAN60518 




ID 


AAN60518 standard; DNA; 3401 BP. 




XX 






AC 


AAN60518; 




XX 






DT 


01-JAN-1980 (first entry) 




XX 






DE 


Open reading frame coding for the erythropoietin tryptic 


fragment 


DE 


of lambda HEPOl . 




XX 






KW 


Erythropoietin; lamba HEPOl; recombinant plasmid vector; 


anaemia; 


KW 


mammal cell culture; 3T3; CHO; Chinese hamster ovary; ss 




XX 






OS 


Homo sapiens . 




XX 






FH 


Key Location/Qualifiers 




FT 


intron 617.. 629 




FT 


/*tag= a 




FT 


intron 1195 . . 1340 




FT 


/*tag= b 




FT 


intron 1598.. 1684 




FT 


/*tag= c 




FT 


intron 2296. .2475 




FT 


/*tag^ d 




FT 


intron 2610. .2762 




FT 


/*tag= e 




XX 






PN 


WO8603520-A. 




XX 






PD 


19-JUN-1986 . 




XX 






PF 


03-DEC-1985; 85WO-US02 4 05 . 




XX 






PR 


22-JAN-19 85; 85US-0693258 . 




PR 


04-DEC-19 84; 84US-0677 813 . 




PR 


03-JAN-1985; 85US-068 8 622 . 




YY 






PA 


(GENE-) GENETICS INST INC. 




PA 


(FRIT/) FRITSCHE E. 




XX 






PI 


Fritsch E, Hewick RM, Jacobs K; 




XX 






DR 


WPI; 1986-169459/26. 




DR 


P-PSDB; AAP60598. 




XX 






PT 


Prodn. of human cDNA clone expressing erythropoietin - for mass 


PT 


prodn. of erythropoietin, useful for treating anaemia 




XX 






PS 


Disclosure; Page 19; 61pp; English. 




XX 






cc 


Recombinant plasmid vector lambda HEPOl expressing this 


genomic 




CC fragment is expressed in e.g. 3T3 or CHO cell cultures. The 

CC produced erythropoietin is useful for treatment of anaemia, 

CC especially renal anaemia. The cloned gene expresses high levels of 

CC the protein and thus provides a means of mass production. See 

CC also AAN60513-17, AAN60519-21 and AAP60599. 

XX 

SQ Sequence 3401 BP; 698 A; 1033 C; 994 G; 676 T; 0 other; 



Query Match 38.5%; Score 15; DB 7; Length 3401; 

Best Local Similarity 100.0%; Pred. No. 35; 

Matches 15; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 14 cctcaacccaggcgt 28 

I I I I I I I I I I I I I I I 
Db 240 cctcaacccaggcgt 254 



Search completed: May 7, 2002, 19:18:17 
Job time: 3810 sec 



RESULT 5 

PAE18050 

LOCUS 

DEFINITION 
ACCESSION 
VERSION 
KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



^ ^ # 



TITLE 



JOURNAL 
MEDLINE 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



FEATURES 

source 



gene 



CDS 



gene 



aeruginosa 



2310 bp DNA linear BCT 05-JUL-1999 

intll, blaVIM and aacA4 (partial) genes. 



beta -lactamase; 



PAE18050 
Pseudomonas 
Y18050 

Y18050.1 GI:5420397 

aacA4 gene; aminoglucoside acetyl-transferase 
blaVIM gene; DNA integrase; intll gene. 
Pseudomonas aeruginosa . 
Pseudomonas aeruginosa 

Bacteria; Proteobacteria ; gamma subdivision; Pseudomonadaceae; 
Pseudomonas . 

1 (bases 1 to 2310) 

Lauretti, L. , Riccio,M. L. , Mazzariol, A. , Cornaglia, G. , 
Amicosante, G. , Fontana,R. and Rossolini, G.M. 

Cloning and characterization of blaVIM, a new integron-borne 
metallo-beta-lactamase gene from a Pseudomonas aeruginosa clinical 
isolate 

Antimicrob. Agents Chemother. 43 (7), 1584-1590 (1999) 
99318582 

2 (bases 1 to 2310) 
Rossolini, G.M. 
Direct Submission 

Submitted ( 04-SEP-1998 ) G.M. Rossolini, Dip. Biologia 
Moleculare-Sez. , Microbiologia, , Univ. di Siena, via Laterina N.8, 
I- 53100 Siena, ITALY 

Location/Qualifiers 

1. .2310 

/organism=" Pseudomonas aeruginosa" 

/strain= n VR-143/97 M 

/db_xref="taxon:287" 

/clone="pAC-2AL" 

complement (72 . . 1085) 

/gene="intll" 

complement (72 . .1085) 

/gene="intll" 

/ codon_start=l 

/transl_table=ll 

/product= fl DNA integrase" 

/protein_id="CAB46685. 1" 

/db_xref="GI : 5420398" 

/translation="MKTATAPLPPLRSVKVLDQLRERIRYLHYSLPTEQAYVHWVRAF 
IRFHGVRHPATLGSSEVEAFLSWLANERKVSVSTHRQAIAAL,LFFYGKVLCTDLPWLQ 
EIGRPRPSRRLPWLTPDEWRILGFLEGEHRLFAQLLYGTGMRISEGLQLRVKDLDF 
DHGTIIVREGKGSKDRALMLPESLAPSLREQLSRARAWWLKDQAEGRSGVALPDALER 
KY P RAGH S W PW FWVFAQHT H S T D P RS GWRRHHMYDQT FQ RAFKRAVEQAGI T KPAT P 
HTLRHS FATALLRSGYDI RTVQDLLGHS DVSTTMI YTHVLKVGGAGVRS PLDALP PLT 
SER" 

1122. .2141 
/gene="blaVIM" 
1122. .2135 
/gene="blaVIM" 
/note="gene cassete" 
1221. .1227 
/gene="blaVIM" 

/note="cassette upstream conserved recombination core 



misc feature 



misc feature 



site" 

CDS 1252. .2052 

/gene="blaVIM" 
/ codon__start=l 
/transl_table=ll 
/product="beta-lactamase VIM-1" 
/protein_id="CAB46686. 1" 
/db_xref="GI : 5420399" 

/translation="MLKVISSLLVYMTASVMAVASPLAHSGEPSGEYPTVNEIPVGEV 
RLYQIADGVWSHIATQSFDGAVYPSNGLIVRDGDELLLIDTAWGAKNTAALLAEIEKQ 
IGLPVTRAVSTHFHDDRVGGVDVLRAAGVATYASPSTRRLAEAEGNEIPTHSLEGLSS 
SGDAVRFGPVELFYPGAAHSTDNLWYVPSANVLYGGCAVHELSSTSAGNVADADLAE 
WPTSVERIQKHYPEAEWIPGHGLPGGLDLLQHTANWKAHKNRSVAE" 
misc_feature 2135. .2141 

/gene="blaVIM" 

/note="cassette downstream conserved recombination core 

site; cassette inverse recombination core site" 
misc_f eature 2136. .>2310 

/gene="aacA4 " 

/note="gene cassete" 
gene 2136. .2310 

/gene-"aacA4 " 
CDS 2196. .>2310 

/gene="aacA4 " 

/ codon_start=l 

/transl_table=ll 

/product="aminoglucoside acetyl- transf erase" 
/protein_id="CAB46687. 1" 
/db_xref="GI : 5420400" 

/translation= "MTEHDLAMLYEWLNRSHI VEWWGGEEARPTLADVQEQY" 
BASE COUNT 499 a 659 c 681 g 471 t 

ORIGIN 



Query Match 41.0%; Score 16; DB 1; Length 2310; 

Best Local Similarity 100.0%; Pred. No. 39; 

Matches 16; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 14 cctcaactcaggcgtt 29 

I I I I I I I I I I I I I II I 
Db 2122 CCTCAACTCAGGCGTT 2137 



RESULT 9 

HSPLGLN 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
MEDLINE 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



FEATURES 

source 



CDS 



mRNA 



linear PRI 14-DEC-1995 



Craniata; Vertebrata; Euteleostomi; 
Catarrhini; Hominidae; Homo. 

Zimbelmann, R. , Muell ar , H . M. , 



4027-4031 (1989) 



German Cancer Research 



polyA 
BASE COUNT" 
ORIGIN 



HSPLGLN 3490 bp 

H. sapiens mRNA for plakoglobin. 
Z68228 

Z68228.1 GI:1122888 
plakoglobin . 
human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Primates; 

1 (bases 1 to 3490) 
Fr anke , W . W . , Golds chmidt , M . D . , 
Schiller, D. L. and Cowin,P. 

Molecular cloning and amino acid sequence of human plakoglobin, the 
common junctional plaque protein 
Proc. Natl. Acad. Sci . U.S.A. 86 (11) 
89264555 

2 (bases 1 to 3490) 
Zimbelmann, R. 
Direct Submission 

Submitted ( 14-DEC-1995 ) Zimbelmann R. . 
Center, Division for Cell Biology, Im Neuenheimer Feld 280, D-69120 
Heidelberg, Federal Republik of Germany 

Location/Qualifiers 

1. .3490 

/organism="Homo sapiens" 
/db_xref="taxon: 9606" 
/clone="HPG Ca 5.1" 
/dev_stage="adult" 
120. .2357 
/codon_start=l 
/product= "plakoglobin 11 
/protein_id="CAA92522 .1" 
/db_xref="GI: 1122889" 
/db_xref="SPTREMBL:Q15151" 

/translation="MEVMNLMEQPIKVTEWQQTYTYDSGIHSGANTCVPSVSSKGIME 
EDEACGRQYTLKKTTTYTQGVPPSQGDLEYQMSTTARAKRVREAMCPGVSGEDSSLLL 
ATQVEGQATNLQRLAEPSQLLKSAIVHLINYQDDAELATRALPELTKLLNDEDPWVT 
KAAMIVNQLSKKEAS RRALMGS PQLVAAWRTMQNTS DLDTARCTT S I LHNLSHHREG 
LLAIFKSGGIPALVRMLSSPVESVLFYAITTLHNLLLYQEGAKMAVRLADGLQKMVPL 
LNKNNPKFLAITTDCLQLLAYGNQESKLIILANGGPQALVQIMRNYSYEKLLWTTSRV 
LKVLSVCPSNKPAIVEAGGMQALGKHLTSNSPRLVQNCLWTLRNLSDVATKQEGLESV 
LKILVNQLSVDDVNVLTCATGTLSNLTCNNSKNKTLVTQNSGVEALIHAILRAGDKDD 
ITEPAVCALRHLTSRHPEAEMAQNSVRLNYGIPAIVKLLNQPNQWPLVKATIGLIRNL 
ALCPANHAPLQEAAVIPRLVQLLVKAHQDAQRHVAAGTQQPYTDGVRMEEIVEGCTGA 
L H I LARD PMN RME I F RLNT I P L FVQ LL Y S S VEN I Q RVAAGVL C E LAQ D KEAADAI DAE 
GAS AP LMELLH S RN EGT AT YAAAVL FRI S EDKN P D YRKRVS V E LTN S L FKHD PAAWEA 
AQSMIPINEPYGDDMDATYRPMYSSDVPLDPLEMHMDMDGDYPIDTYSDGLRPPYPTA 
DHMLA" 
3475. .3480 
a 1172 c 979 g 667 t 



signal 

672 



Query Match 41.0%; Score 16; DB 9; Length 3490; 

Best Local Similarity 100.0%; Pred. No. 41; 



Matches 



16; Conservative 



0; Mismatches 



0; Indels 



0; Gaps 



0; 



Qy 3 gctctgcgccacctca 18 

I I I I I I I I II I I I I I I 
Db 1491 GCTCTGCGCCACCTCA 1506 



RESULT 10 

G31640 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
MEDLINE 
PUBMED 

REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



G31640 3490 bp DNA linear STS 28-SEP-1998 

sWSS397 Eric D. Green Homo sapiens STS genomic, sequence tagged 
site . 
G31640 

G31640.1 GI:1916365 

STS. 

human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 3490) 

Bouf f ard, G. G . , Iyer,L.M., Idol, J. R., Braden,V.V., Cunningham, A. F. f 
Weintraub, L. A. , Mohr-Tidwell, R.M. , Peluso,D.C. , Fulton, R.S. , 
Leckie,M.P. and Green, E.D. 

A collection of 1814 human chromosome 7-specific STSs 

Genome Res. 7 (1), 59-64 (1997) 

97189344 

9037602 

2 (bases 1 
Green, E.D. 
Human chromosome 
Unpublished 
Synonyms : JUP 
GDB: GDB: 185732 
GDB_DSEG: JUP 
Contact: Eric D. Green 
Genome Technology Branch 

National Human Genome Research Institute/NIH 



to 3490) 

7 STSs (1997) 



49 Convent Dr., MSC4431, Bldg 
Tel: 3014020201 
Fax: 3014024735 
Email: egreen@nhgri.nih.gov 
Primer A: GAGGCGTCGCGGCGGGC 
Primer B: G GT AC AG GAG C AG GT T G 
STS size: 253 
PCR Profile: 

Presoak : 

Denaturation : 

Annealing: 

Polymerization: 

PCR Cycles: 

Thermal Cycler: 
Protocol : 

Template : 

Primer : 

dNTPs : 

Taq Polymerase: 
Total Vol: 



49, Rm. 2A08, Bethesda, MD 20892 



0 degrees C for 
92 degrees C for 
60 degrees C for 
72 degrees C for 
35 

PerkinElmer TC 



0. 00 minute (s) 
1.00 minute (s ) 
2 . 00 minute (s) 
2 . 00 minute (s) 



30-100 ng 
each 1 uM 
each 200 uM 
0.05 units/ul 
5 ul 



Buffer: 



MgCl2 : 
KC1: 

Tris-HCl: 
pH: 



2.5 mM 
50 mM 
10 mM 
8.3 



This STS has been incorporated into the NHGRI chromosome 7 
physical map, but was developed by another investigator. See 
GenBank record: Z68228 For additional information about the NHGRI 
chromosome 7 mapping project, see 

http://www.nhgri.nih.gov/DIR/GTB/CHR7. Also see Genomics 
11:548-64 (1991) [MUID=92128937 ] . 
FEATURES Location/ Qualifiers 

source 1. .3490 

/organism= !! Homo sapiens" 
/db_xref="taxon: 9606" 
/map= ,, 7" 

/clone_lib="Eric D. Green" 
gene 1. .3490 

/gene="JUP" 
STS 636. .888 

/gene="JUP" 
primer_bind 63 6. .652 

/gene="JUP" 
primer_bind complement ( 872 . .888) 

BASE COUNT 672 a 1171 c 980 g 667 t 

ORIGIN 



Query Match 41.0%; Score 16; DB 11; Length 34S0; 

Best Local Similarity 100.0%; Pred. No. 41; 

Matches 16; Conservative 0; Mismatches 0; Indels 



0; Gaps 



0; 



Qy 3 gctctgcgccacctca 18 

I I I I I I I I I I I I I I I I 
Db 1491 GCTCTGCGCCACCTCA 1506 



(FILE 'HOME' ENTERED AT 14:51:31 ON 12 JUN 2002) 

FILE 'MEDLINE, BIOSIS, CAPLUS 1 ENTERED AT 14:51:45 ON 12 JUN 2002 
LI 66 S MUCIN8 OR MUC8 OR (MUCIN 8) OR (MUC 8) 

L2 8 S LI AND (GENOMIC OR CLON? ) 

L3 5 DUP REM L2 (3 DUPLICATES REMOVED) 



(FILE "HOME* ENTERED AT 18:16:14 ON 11 JUN 2002) 
FILE ' MEDLINE, BIOSIS, CAPLUS 1 ENTERED AT 18:17:50 ON 11 JUN 2002 



LI 32550 S MUCIN 

L2 55 S LI AND SPLICE 

L3 28 DUP REM L2 (27 DUPLICATES REMOVED) 

FILE 1 STNGUIDE ' ENTERED AT 18:19:44 ON 11 JUN 2002 

FILE 'MEDLINE, BIOSIS, CAPLUS 1 ENTERED AT 18:29:16 ON 11 JUN 2002 

L4 0 S L3 AND (MUC8 OR MUCIN 8) 

L5 54 S (MUCIN 8) OR MUC8 

L6 0 S L5 AND (VARIANT OR POLYMORPH? OR SNP) 

L7 33 DUP REM L5 (21 DUPLICATES REMOVED) 

L8 79 S RP11 

L9 OS RP11-0702? 

L10 21 S L8(5A) (CLONE OR BAC) 

Lll 13 DUP REM L10 (8 DUPLICATES REMOVED) 

L12 21 S ((BACTERIAL ARTIFICIAL) OR BAC) (3A) ( CHROMOSOME ( 3A) 7 ? ) 

L13 51 S ( (BACTERIAL ARTIFICIAL) OR BAC) (3A) ( CHROMOSOME ( 3A) 7###### ) 

L14 16 DUP REM L13 (35 DUPLICATES REMOVED) 

L15 24 S ((BACTERIAL ARTIFICIAL) OR BAC) (3A) 

( CHROMOSOME (3A) 12######) 

L16 14 DUP REM L15 (10 DUPLICATES REMOVED) 



